Open muyuhuatang opened 3 years ago
May I ask what is the allennlp version in this project? I tried 2.2.0 and 0.9.0, but all lead to errors.
I tried using the pinned version (specified in environment.yml), and that also failed with the error shared above. Please provide a working environment.yml.
I think there might be an issue with the datasets that are publicly available?
ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
@gmarcial44 are you using the latest-allennlp branch? if so, I was able to get around this issue by replacing the environments.datasets.py
file with the following:
NER_DATASETS = {
"ncbi": {
"data_dir": "/home/suching/scibert/data/ner/NCBI-disease/",
},
"sciie": {
"data_dir": "/home/suching/scibert/data/ner/sciie/"
},
"jnlpba": {
"data_dir": "/home/suching/scibert/data/ner/JNLPBA/"
},
"bc5cdr": {
"data_dir": "/home/suching/scibert/data/ner/bc5cdr/"
}
}
CLASSIFICATION_DATASETS = {
"chemprot": {
"data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/chemprot/",
"dataset_size": 4169
},
"rct-20k": {
"data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/rct-20k/",
"dataset_size": 180040
},
"rct-sample": {
"data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/rct-sample/",
"dataset_size": 500
},
"citation_intent": {
"data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/citation_intent/",
"dataset_size": 1688
},
"sciie": {
"data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/sciie/",
"dataset_size": 3219
},
"ag": {
"data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/ag/",
"dataset_size": 115000
},
"hyperpartisan_news": {
"data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/hyperpartisan_news/",
"dataset_size": 500
},
"imdb": {
"data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/imdb/",
"dataset_size": 20000
},
"amazon": {
"data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/amazon/",
"dataset_size": 115251
}
}
DATASETS = {"NER": NER_DATASETS, "CLASSIFICATION": CLASSIFICATION_DATASETS}
Could you please check the implementation steps you provided in the README file?
I followed your instructions but find it very hard to reproduce this work, someerrors would come out like version inconsistency between allennlp and transformers, then lead to error like:
subprocess.CalledProcessError: Command 'allennlp train training_config/classifier.jsonnet --include-package dont_stop_pretraining -s model_logs\citation_intent_base' returned non-zero exit status 1.
Or just there are some wrong steps during my implementation? It is really confusing and frustrating.