Closed matanox closed 5 years ago
To train a model, take a configuration file (like the one that we have for our NER model), modify the paths in it to point to your data, and run allennlp train CONFIG_FILE -s PLACE_TO_SAVE_RESULTS
.
@matt-gardner What you please share a samples training file.. i want to see the acceptable format and required fields ..
Also i the config file it says
"tokens": { "type": "embedding", "embedding_dim": 50, "pretrained_file": "https://s3-us-west-2.amazonaws.com/allennlp/datasets/glove/glove.6B.50d.txt.gz", "trainable": true },
so can i remove these two line since i dont want to use a pretrained model "pretrained_file": "https://s3-us-west-2.amazonaws.com/allennlp/datasets/glove/glove.6B.50d.txt.gz", "trainable": true
Further more i my model to be train on elmo embeddings but dont want to use any pre-trained model
Example file: https://github.com/allenai/allennlp/blob/master/allennlp/tests/fixtures/data/conll2003.txt. You can see from the training config I pointed to that the dataset reader is a conll2003
reader: https://github.com/allenai/allennlp/blob/9dec020281ee9521e7f1ffd696bcbb102c399703/training_config/ner.jsonnet#L4-L7 If you look at our documentation (or the source code) you can see what the expected file format is, and if you look at our tests, you can see the fixture that is used for the test.
If you don't want pre-trained embeddings, you can remove those two lines, yes. For ELMo, look at our tutorial: https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md.
Example file: https://github.com/allenai/allennlp/blob/master/allennlp/tests/fixtures/data/conll2003.txt. You can see from the training config I pointed to that the dataset reader is a
conll2003
reader:allennlp/training_config/ner.jsonnet
Lines 4 to 7 in 9dec020
"dataset_reader": { "type": "conll2003", "tag_label": "ner", "coding_scheme": "BIOUL", If you look at our documentation (or the source code) you can see what the expected file format is, and if you look at our tests, you can see the fixture that is used for the test. If you don't want pre-trained embeddings, you can remove those two lines, yes. For ELMo, look at our tutorial: https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md.
if you take a look at RASA implementation https://rasa.com/docs/nlu/evaluation/ they donot intent to use BIOUL annotating scheme and the reason they have explained in the link. Can we enable such behavior in allenai implementation by doing some changes in config file.
If your input data doesn't match the dataset readers that we have implemented, you could pretty easily write your own that matches the input format you have.
Hi, the tutorial https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md is not working, could you five us the new link please
Hi, the tutorial https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md. is not working
Hi, the tutorial https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md. is not working
It is here in docs now : https://github.com/allenai/allennlp/blob/master/docs/tutorials/how_to/elmo.md
I'm new to AllenNLP, and I was considering unleashing its NER training algorithm on new datasets. I'm a little reluctant about doing so however, as I haven't found how to use the API for that. Can you possibly point me at sample code for that? have you recently gone through this procedure and have any comments to its stability and resource consumption/duration ballparks?