allanj / ner_with_dependency

GNU General Public License v3.0
72 stars 11 forks source link

Potential bugs when running "dggcn" mode #4

Closed llcing closed 4 years ago

llcing commented 4 years ago

Hi, I attempted to run your codes in the "dggcn" mode, but I got a weird error "RuntimeError: CUDA error: device-side assert triggered" in this line: https://github.com/allanj/ner_with_dependency/blob/a1117dde272a264e3ff8af3ebabe6da160b296cf/model/deplabel_gcn.py#L76 and this line: https://github.com/allanj/ner_with_dependency/blob/a1117dde272a264e3ff8af3ebabe6da160b296cf/model/deplabel_gcn.py#L79

allanj commented 4 years ago

Hi, I tried to reproduce this error (though I have seen such thing in my experience.) but I couldn't make it.

I simply change the dataset argument to spanish, using the spanish dataset attached in this repo. It seems there is no error. (Other changes to the code, device is set to cuda:0, dep_model to dggcn)

From what I have experienced before, such error usually happens when we use a tensor as index and the tensor contains -1 or index exceed the maximum length.

  1. Are you using the example spanish dataset?
  2. One potential issue might be during reading data in config/reader.py , the dependency head index will subtract 1. Thus, if your dataset has already subtracted 1, then you do not have to do so.

Let me know if there are any issues.

llcing commented 4 years ago

Hi, I tried to reproduce this error (though I have seen such thing in my experience.) but I couldn't make it.

I simply change the dataset argument to spanish, using the spanish dataset attached in this repo. It seems there is no error. (Other changes to the code, device is set to cuda:0, dep_model to dggcn)

From what I have experienced before, such error usually happens when we use a tensor as index and the tensor contains -1 or index exceed the maximum length.

  1. Are you using the example spanish dataset?
  2. One potential issue might be during reading data in config/reader.py , the dependency head index will subtract 1. Thus, if your dataset has already subtracted 1, then you do not have to do so.

Let me know if there are any issues.

Thanks a lot! I tried the spanish dataset, and the error did not occur. Maybe you're right, the devil is in the config/reader.py, but when I still get the error when I use conll2003 dataset, I'll check the data format carefully later. I have another problem, when I use dggcn mode with spanish without contextual embedding, I find that the loss is nan, do you know why?

allanj commented 4 years ago

I think it could be the problem of gradient explosion/vanishing.

Does it happen in the first few epochs as well? or just later?

And maybe you can also consider changing the clipping value and use clipping value starting from the 1st GCN layer.

https://github.com/allanj/ner_with_dependency/blob/master/main.py#L150-L151

allanj commented 4 years ago

close due to inactive activity. Feel free to open the issue again for discussion