allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.75k stars 2.25k forks source link

Training AllenNLP SRL Model on Ontonotes 5 data #4017

Closed francesca418 closed 4 years ago

francesca418 commented 4 years ago

I am also trying to train the AllenNLP SRL model on the ontonotes data. I currently have all files in the form .gold_skel - I want to use the provided jsonnet config file.

When I try to train, passing in a directory as the train path, I get an error saying that the train path points to a directory - but isnt it supposed to recursively search for all training files in a directory?

Also, does anyone know if there is a dataset reader that would work with the ontonotes SRL documents that are in json format?

schmmd commented 4 years ago

@francesca418 can you give us the exact command you are running along with the error? What version of AllenNLP are you using?

There might be a script that puts the training data in the right format.

nitishgupta commented 4 years ago

Hi @schmmd , I'm putting my issue here since it solves what @francesca418 has but leads to a different issue.

I pre-processed ontonotes-5.0 using the script -- scripts/compile_coref_data.sh and created the conll-formatted-ontonotes-5.0/ directory which inside has a directory v4. Inside this directory is the same dir structure as needed here Ontonotes

When I start training using bert_base_srl.jsonnet the reader is able to start reading the data which means the CONLL-2012 data w/ Ontonotes annotations was correctly pre-processed, BUT I run into an error after having read 323937it instances. I get an error:

ValueError: Tree.read(): expected ')' but got 'end-of-string'
            at index 266.
                "...9.20) ))))"
nitishgupta commented 4 years ago

I modified the reader to skip instances that don't get parsed correctly. Even training throws an error after 18 iterations. Its a long error message, snippet below:

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [401,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
....
...
RuntimeError: CUDA error: device-side assert triggered
dirkgr commented 4 years ago

@nitishgupta, @francesca418, is there still an issue here? I know we had a discussion about training SRL in a different GitHub issue, but I don't remember if it was solved.

FWIW, device-side assert triggered can often be debugged by running on CPU (maybe overnight). You will get a better error message then. Most of the time the message is "tensor index out of range".

francesca418 commented 4 years ago

Resolved!!

nitishgupta commented 4 years ago

I haven't tried it myself but seems like @francesca418 has.

rahular commented 3 years ago

@francesca418 I am facing the same problem. How did you resolve it?