Closed ma7555 closed 4 years ago
Hey, I haven't saved a copy of the pre-trained model, but you should be able to train one on colab using the code in under an hour for most variants that you might want, just by changing the appropriate global variables.
Some parts have been copy-pasted as the code was originally written when certain components like XLNetForTokenClassification and RoBERTaForTokenClassification weren't available in the huggingface repo, hence I had to manually copy paste certain class definitions which weren't accessible through imports. Hence, the code length is fairly long. I have divided it into blocks though, which should make it fairly easy to follow along.
Hi adityak, thanks for coming back to me.
We want to do the training with the bioscope corpus, however the XML format it is found on is very unfriendly for me. wondering if you have managed to parse it?
The parsing code is included in the 'bioscope' method of 'Data' class. You can input the path to the bioscope corpus file and get the processed output in the form of a data object.
Okay amazing, will take a look
Hello there,
Thanks so much for taking the time to write and share this code. I was asking if you already have trained this model and willing to share it.
Another thing is that the notebook is very helpful but it is too long to catch up with including parts that should have been imported from libraries instead of copied/pasted.
Looking forward for your colaboration