Project/Bert - Githubissues

Stefanos-stk commented 4 years ago

(Fixed)

I am having an issue running the names.py file in the server I get this error:

from torch.utils.tensorboard import SummaryWriter
ModuleNotFoundError: No module named 'torch.utils.tensorboard'

I tried:

pip3 install tensorboard

and

pip3 install --upgrade tensorboard && pip3 install --upgrade torch

But I get : IOError: [Errno 122] Disk quota exceeded I assume it is not working due to the fact that we downgraded the torch in my branch of the server in order to run the Nvidia sentiment. I am going to try running stuff on my computer for now.

Update: turns out my computer is not happy with me for some reason and it keeps asking for some missing DLL files, which I don't know how to fix. I went back to the server however I got the same error: IOError: [Errno 122] Disk quota exceeded even when I try to do cd tab button to says disk quota exceeded, I am pretty sure that I have not used all of the gb of space you gave me, might something to do with some local cache. I was making and deleting a lot of heatmaps plots yesterday. How can I clean the local cache ?

ssaa2018@lambda-server:~$ df -i
Filesystem        Inodes   IUsed     IFree IUse% Mounted on
udev            32961483     831  32960652    1% /dev
tmpfs           32971051    1607  32969444    1% /run
/dev/nvme0n1p2 117178368  726810 116451558    1% /
tmpfs           32971051       2  32971049    1% /dev/shm
tmpfs           32971051       4  32971047    1% /run/lock
tmpfs           32971051      18  32971033    1% /sys/fs/cgroup
/dev/nvme0n1p1         0       0         0     - /boot/efi
/dev/sda1      793501696 3179733 790321963    1% /data
tmpfs           32971051      10  32971041    1% /run/user/1003
tmpfs           32971051      29  32971022    1% /run/user/1063
tmpfs           32971051      10  32971041    1% /run/user/1059

Update: I deleted some temp files from server (from my user) upgraded torch now it works. I assume that Nvidia sentiment will not work now.

I deleted bash history and some pip folders in /tmp

mikeizbicki commented 4 years ago

I've increased your quota on the server to 16gb (was 8gb), so you should have room to install tensorboard now. Also, your .cache folder is taking up quite a bit of space and probably explains why you couldn't install it before.

On Tue, 2020-06-16 at 05:29 -0700, Stefanos-stk wrote:

I am having an issue running the names.py file in the server I get this error: from torch.utils.tensorboard import SummaryWriter ModuleNotFoundError: No module named 'torch.utils.tensorboard' I tried: pip3 install tensorboard and pip3 install --upgrade tensorboard && pip3 install --upgrade torch But I get : IOError: [Errno 122] Disk quota exceeded I assume it is not working due to the fact that we downgraded the torch in my branch of the server in order to run the Nvidia sentiment. I am going to try running stuff on my computer for now. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Stefanos-stk commented 4 years ago

I believe I am pretty close in completing part 1 of the project which is the sliding window function (def explain function):


def explain(line,filename,explain_type):

    if explain_type is 'char':
        formated_line  = format_line(line)
        ls  = list(formated_line) 
        input_tensor = str_to_tensor(ls)
        _,x = model(input_tensor)
        probs = softmax(x)
        print(len(ls))
        scores = torch.zeros([len(line)])
        x = int(len(ls))
        for i in range(0,x):
            print(i)
            #copy of tensor 
            copy_tensor = input_tensor
            print(copy_tensor.shape)
            print(input_tensor.shape)
            #filling the corresponding letter with 0's
            copy_tensor[:,i].fill_(0)
            _,z = model(copy_tensor)
            #getting the probs
            probs_ = softmax(z)
            #Calculate the L2 distance
            l2 = (torch.dist(probs, probs_,2))
            scores[i] = int(l2)
        line2img(line,scores,filename)

This is what I have so far, I am pretty sure it is not the best way to go about it. I think there is a problem with the Eycledian distance (l2). I tried cdist,dust,norm with a p=2 however I am not able to get the result you got in the tutorial. WIth the above code this is what I got; line0001 char

I haven't included the (words) one because I assume it works similarly with the 'char' one. I think the sliding technique might have an error however I don't think I am finding it.

mikeizbicki commented 4 years ago

In python, you should (usually) only use the keyword is with None. is checks equality of the pointer rather than of the value, and that's definitely not what you want in this case.
The line copy_tensor = input_tensor doesn't actually copy the tensor. Both those variables refer to the same tensor. Therefore, everytime you call .fill_(0), you are overwriting the original tensor and not a copy. To copy a tensor in pytorch, use the command copy_tensor = input_tensor.clone().copy().
You want l2 to be torch.dist squared to get the exact same plots that I had, but the difference is very minor.

Fix those things and I think you should get the right results.

Stefanos-stk commented 4 years ago

After a number of tutorials about BERT and some other technicalities, I think I am on a good path. I have a quick question regarding loading the data. I have been trying loading the multilingual files using this command: python3 names.py --data coronavirus-headlines/corona.multilang100.jsonl.gz --data_format headlines However, I am getting this error: File "names.py", line 152, in <module> day = article['day'].split()[0] KeyError: 'day'

mikeizbicki commented 4 years ago

A previous version of the dataset had a day field in the json file, but this version of the dataset does not. This is causing an error on the line that loads this field. You can simply delete that line to get rid of the error and this shouldn't affect anything else in the code.

On Thu, 2020-06-18 at 17:10 -0700, Stefanos-stk wrote:

After a number of tutorials about BERT and some other technicalities, I think I am on a good path. I have a quick question regarding loading the data. I have been trying loading the multilingual files using this command: python3 names.py --data coronavirus-headlines/corona.multilang100.jsonl.gz -- data_format headlines However, I am getting this error: File "names.py", line 152, in day = article['day'].split()[0] — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Stefanos-stk commented 4 years ago

ssaa2018@lambda-server:~$ cd project/
ssaa2018@lambda-server:~/project$ tensorboard --version
2020-06-22 08:28:32.304858: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-06-22 08:28:32.310467: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2.2.2
ssaa2018@lambda-server:~/project$ tensorboard --logdir=runs
2020-06-22 08:29:10.038508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-06-22 08:29:10.045844: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.2.2 at http://localhost:6006/ (Press CTRL+C to quit)
^Cssaa2018@lambda-server:~/project$ tensorboard --logdir='./tensorboard_dirs' --port=16007
2020-06-22 08:30:00.976619: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-06-22 08:30:00.983105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.2.2 at http://localhost:16007/ (Press CTRL+C to quit)
^Cssaa2018@lambda-server:~/project$ ls
coronavirus-headlines  explain_outputs  log  models  names.py  runs  transformers_tutorial.py
ssaa2018@lambda-server:~/project$ ls log
'model=rnn_hidden=128_layers=1_cond=False_resnet=False_lr=0.1_optim=sgd_clip=False_2020-06-22 08:18:27.137817'
'model=rnn_hidden=128_layers=1_cond=False_resnet=False_lr=0.1_optim=sgd_clip=False_2020-06-22 08:21:06.366264'
ssaa2018@lambda-server:~/project$ channel 3: open failed: connect failed: Connection refused
channel 4: open failed: connect failed: Connection refused
channel 5: open failed: connect failed: Connection refused
channel 6: open failed: connect failed: Connection refused

Stefanos-stk / Bertmoticon

Project/Bert #5