Closed philz0918 closed 2 years ago
@philz0918 There are a couple of things which can cause the list of instances to be empty. One is that you need to specify the path to the folder containing your conll file, and also, given how our ontonotes reader reads it, it needs to end with gold_conll
. Next, please make sure that your file contains the fields given here in the correct format. Let us know if this still results in the error.
@AkshitaB Hi there, thank you so much for your quick response. We have been testing out with what you suggested including making the file extension"gold_conll", but we are still receiving the same error. We created a conll file with example data from the Ontonotes data.
We tried running ontonotes reader function
from allennlp_models.common.ontonotes import Ontonotes
ontonotes_reader = Ontonotes()
conll_gen = ontonotes_reader.dataset_document_iterator(file_path = file_path)
When we printed the resulting generator, the list was still empty.
print(list(conll_gen))
We're not sure what we're still missing. Please let me know if there is other things we can try.
@AkshitaB Just wondering if there were any updates regarding this issue?
@AkshitaB, We were able to train a model using allennlp 2.1.0, with the following :
- 0 0 un DET (* - - - - - - (ARG0* -
- 0 1 journaliste NOUN * - - - - - - * -
- 0 2 comme ADP * - - - - - - * -
- 0 3 Moix ADJ * - - - - - - *) -
- 0 4 peut AUX * - - - - - (V*) (ARGM-MOD*) -
- 0 5 il PRON * - - - - - - - -
- 0 6 insulter VERB * - - - - - - (V* -
- 0 7 Marine NOUN * - - - - - - *) -
- 0 8 Le DET * - - - - - - - -
- 0 9 Pen PROPN * - - - - - - * -
- 0 10 , PUNCT * - - - - - - * -
- 0 11 une DET * - - - - - - * -
- 0 12 candidate NOUN * - - - - - - *) -
- 0 13 à ADP * - - - - - - - -
- 0 14 la DET * - - - - - - - -
- 0 15 présidentielle NOUN * - - - - - - * -
- 0 16 comme ADP * - - - - - - * -
- 0 17 les DET * - - - - - - * -
- 0 18 autres ADJ * - - - - - - *) -
- 0 19 ? PUNCT *) - - - - - - - -
The minimum columns needed are sentence ID(just need to be a integer i.e. 0) , token ID, words, POS, parse tree, and SRL frames.
We tried to do that training with allennlp 2.9.3, we first got an error with cached-path error(solved with updating cached-path to 1.1.2)
AttributeError: module 'cached_path' has no attribute 'file_friendly_logging'
After we updated cached-path, we received the following error:
Traceback (most recent call last):
File "/usr/local/bin/allennlp", line 8, in <module>
sys.exit(run())
File "/usr/local/lib/python3.7/dist-packages/allennlp/__main__.py", line 39, in run
main(prog="allennlp")
File "/usr/local/lib/python3.7/dist-packages/allennlp/commands/__init__.py", line 120, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/allennlp/commands/train.py", line 120, in train_model_from_args
file_friendly_logging=args.file_friendly_logging,
File "/usr/local/lib/python3.7/dist-packages/allennlp/commands/train.py", line 186, in train_model_from_file
return_model=return_model,
File "/usr/local/lib/python3.7/dist-packages/allennlp/commands/train.py", line 264, in train_model
file_friendly_logging=file_friendly_logging,
File "/usr/local/lib/python3.7/dist-packages/allennlp/commands/train.py", line 508, in _train_worker
metrics = train_loop.run()
File "/usr/local/lib/python3.7/dist-packages/allennlp/commands/train.py", line 581, in run
return self.trainer.train()
File "/usr/local/lib/python3.7/dist-packages/allennlp/training/gradient_descent_trainer.py", line 771, in train
metrics, epoch = self._try_train()
File "/usr/local/lib/python3.7/dist-packages/allennlp/training/gradient_descent_trainer.py", line 922, in _try_train
hardlink_or_copy(model_state_file, self._best_model_filename)
File "/usr/local/lib/python3.7/dist-packages/allennlp/common/file_utils.py", line 621, in hardlink_or_copy
os.link(source, dest)
OSError: [Errno 38] Function not implemented: "srl_output_folder/model_state_e1_b0.th" ->"srl_output_folder/best.th"
It seems like this issue isn't related to the training itself rather to the file utils, but still prevented the training from finishing.
AttributeError Traceback (most recent call last) Input In [33], in <cell line: 1>() ----> 1 main()
Input In [32], in main() 63 def main(): ---> 64 get_predictor() 65 run_document(party='dem') 66 run_document(party='rep')
Input In [32], in get_predictor() 1 def get_predictor(): ----> 2 p = Predictor.from_path("https://s3-us-west-2.amazonaws.com/allennlp/models/srl-model-2018.05.25.tar.gz") 3 pickle.dump(p, open('predictor.p', 'wb'))
File ~\anaconda3\lib\site-packages\allennlp\predictors\predictor.py:366, in Predictor.from_path(cls, archive_path, predictor_name, cuda_device, dataset_reader_to_load, frozen, import_plugins, overrides, **kwargs) 363 if import_plugins: 364 plugins.import_plugins() 365 return Predictor.from_archive( --> 366 load_archive(archive_path, cuda_device=cuda_device, overrides=overrides), 367 predictor_name, 368 dataset_reader_to_load=dataset_reader_to_load, 369 frozen=frozen, 370 extra_args=kwargs, 371 )
File ~\anaconda3\lib\site-packages\allennlp\models\archival.py:206, in load_archive(archive_file, cuda_device, overrides, weights_file)
190 """
191 Instantiates an Archive from an archived tar.gz
file.
192
(...)
203 The weights file to use. If unspecified, weights.th in the archive_file will be used.
204 """
205 # redirect to the cache, if necessary
--> 206 resolved_archive_file = cached_path(archive_file)
208 if resolved_archive_file == archive_file:
209 logger.info(f"loading archive file {archive_file}")
File ~\anaconda3\lib\site-packages\allennlp\common\file_utils.py:135, in cached_path(url_or_filename, cache_dir, extract_archive, force_extract) 84 def cached_path( 85 url_or_filename: Union[str, PathLike], 86 cache_dir: Union[str, Path] = None, 87 extract_archive: bool = False, 88 force_extract: bool = False, 89 ) -> str: 90 """ 91 Given something that might be a URL or local path, determine which. 92 If it's a remote resource, download the file and cache it, and (...) 133 from multiple processes on the same file. 134 """ --> 135 _cached_path.file_friendly_logging(common_logging.FILE_FRIENDLY_LOGGING) 136 return str( 137 _cached_path.cached_path( 138 url_or_filename, (...) 142 ) 143 )
AttributeError: module 'cached_path' has no attribute 'file_friendly_logging'
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜
This issue is being closed due to lack of activity. If you think it still needs to be addressed, please comment on this thread 👇
Checklist
main
branch of AllenNLP.pip freeze
.Description
We're attempting to train a SRL model, using the configuration file seen below. Below the configuration file, there is a conll formatted data example that we're using. As per this, [stackoverflow question](https://stackoverflow.com/questions/69090025/how-to-train-allennlp-srl-on-non-english-languages), the only columns that are needed are words and SRLtags columns. Can you please confirm that this is the case, if so I'm not sure why we're receiving this error, please advise. ``` local bert_model = "bert-base-uncased"; { "dataset_reader": { "type": "srl", "bert_model_name": bert_model, }, "data_loader": { "batch_sampler": { "type": "bucket", "batch_size" : 32 } }, "train_data_path": "path/conll_data/ALLEN_FRENCH_TEST_2_train.conll", "validation_data_path":"path/conll_data/ALLEN_FRENCH_TEST_2_val.conll", "model": { "type": "srl_bert", "embedding_dropout": 0.1, "bert_model": bert_model, }, "trainer": { "optimizer": { "type": "huggingface_adamw", "lr": 5e-5, "correct_bias": false, "weight_decay": 0.01, "parameter_groups": [ [["bias", "LayerNorm.bias", "LayerNorm.weight", "layer_norm.weight"], {"weight_decay": 0.0}], ], }, "learning_rate_scheduler": { "type": "slanted_triangular", }, "checkpointer": { "keep_most_recent_by_count": 2, }, "grad_norm": 1.0, "num_epochs": 15, "validation_metric": "+f1-measure-overall", }, } ``` ``` _ _ 0 @Greguyyyy ADP _ _ _ _ _ _ _ _ _ _ _ 1 @HalbeardD PROPN _ _ _ _ _ _ _ _ _ _ _ 2 @Tlibdij PROPN _ _ _ _ _ _ _ _ _ _ _ 3 @JLMelenchon NUM _ _ _ _ _ _ _ _ _ _ _ 4 @BurgerKingFR PROPN _ _ _ _ _ _ _ _ _ _ _ 5 Honnêtement ADV _ _ _ _ _ _ _ (ARGM-ADV*) _ _ _ 6 le DET _ _ _ _ _ _ _ _ _ _ _ 7 capitalisme NOUN _ _ _ _ _ _ _ (ARG1*) _ _ _ 8 a AUX _ _ _ _ _ _ (V*) _ _ _ _ 9 été AUX _ _ _ _ _ _ _ (ARG2*) _ _ _ 10 génial ADJ _ _ _ _ _ _ _ _ _ _ _ 11 sur ADP _ _ _ _ _ _ _ (ARGM-TMP* _ _ _ 12 cette DET _ _ _ _ _ _ _ * _ _ _ 13 période NOUN _ _ _ _ _ _ _ *) _ _ _ 14 … PROPN _ _ _ _ _ _ _ _ _ ```Python traceback:
Related issues or possible duplicates
Environment
OS: Googlecolab(Linux)
Python version: Python 3.7.13
Output of
pip freeze
:``` absl-py==1.0.0 aiohttp==3.8.1 aiosignal==1.2.0 alabaster==0.7.12 albumentations==0.1.12 allennlp==2.9.3 allennlp-models==2.9.3 altair==4.2.0 appdirs==1.4.4 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arviz==0.12.1 astor==0.8.1 astropy==4.3.1 astunparse==1.6.3 async-timeout==4.0.2 asynctest==0.13.0 atari-py==0.2.9 atomicwrites==1.4.0 attrs==21.4.0 audioread==2.1.9 autograd==1.4 Babel==2.10.1 backcall==0.2.0 backports.csv==1.0.7 base58==2.1.1 beautifulsoup4==4.6.3 bert-score==0.3.11 bleach==5.0.0 blis==0.4.1 bokeh==2.3.3 boto3==1.24.5 botocore==1.27.5 Bottleneck==1.3.4 branca==0.5.0 bs4==0.0.1 CacheControl==0.12.11 cached-path==1.1.2 cached-property==1.5.2 cachetools==4.2.4 catalogue==1.0.0 certifi==2022.5.18.1 cffi==1.15.0 cftime==1.6.0 chardet==3.0.4 charset-normalizer==2.0.12 checklist==0.0.11 cheroot==8.6.0 CherryPy==18.6.1 click==7.1.2 cloudpickle==1.3.0 cmake==3.22.4 cmdstanpy==0.9.5 colorcet==3.0.0 colorlover==0.3.0 community==1.0.0b1 conllu==4.4.1 contextlib2==0.5.5 convertdate==2.4.0 coverage==3.7.1 coveralls==0.5 crcmod==1.7 cryptography==37.0.2 cufflinks==0.17.3 cvxopt==1.2.7 cvxpy==1.0.31 cycler==0.11.0 cymem==2.0.6 Cython==0.29.30 daft==0.0.4 dask==2.12.0 datascience==0.10.6 datasets==2.2.2 debugpy==1.0.0 decorator==4.4.2 defusedxml==0.7.1 descartes==1.1.0 dill==0.3.4 distributed==1.25.3 dlib==19.18.0+zzzcolab20220513001918 dm-tree==0.1.7 docker-pycreds==0.4.0 docopt==0.6.2 docutils==0.17.1 dopamine-rl==1.0.5 earthengine-api==0.1.311 easydict==1.9 ecos==2.0.10 editdistance==0.5.3 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz entrypoints==0.4 ephem==4.1.3 et-xmlfile==1.1.0 fa2==0.3.5 fairscale==0.4.6 fastai==1.0.61 fastdtw==0.3.4 fastjsonschema==2.15.3 fastprogress==1.0.2 fastrlock==0.8 fbprophet==0.7.1 feather-format==0.4.1 feedparser==6.0.10 filelock==3.4.2 firebase-admin==4.4.0 fix-yahoo-finance==0.0.22 Flask==1.1.4 flatbuffers==2.0 folium==0.8.3 fr-core-news-sm @ https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-2.2.5/fr_core_news_sm-2.2.5.tar.gz frozenlist==1.3.0 fsspec==2022.5.0 ftfy==6.1.1 future==0.16.0 gast==0.5.3 GDAL==2.2.2 gdown==4.4.0 gensim==3.6.0 geographiclib==1.52 geopy==1.17.0 gin-config==0.5.0 gitdb==4.0.9 GitPython==3.1.27 glob2==0.7 google==2.0.3 google-api-core==1.31.6 google-api-python-client==1.12.11 google-auth==1.35.0 google-auth-httplib2==0.0.4 google-auth-oauthlib==0.4.6 google-cloud-bigquery==1.21.0 google-cloud-bigquery-storage==1.1.1 google-cloud-core==2.3.1 google-cloud-datastore==1.8.0 google-cloud-firestore==1.7.0 google-cloud-language==1.2.0 google-cloud-storage==2.4.0 google-cloud-translate==1.5.0 google-colab @ file:///colabtools/dist/google-colab-1.0.0.tar.gz google-crc32c==1.3.0 google-pasta==0.2.0 google-resumable-media==2.3.3 googleapis-common-protos==1.56.2 googledrivedownloader==0.4 graphviz==0.10.1 greenlet==1.1.2 grpcio==1.46.3 gspread==3.4.2 gspread-dataframe==3.0.8 gym==0.17.3 h5py==3.1.0 HeapDict==1.0.1 hijri-converter==2.2.4 holidays==0.10.5.2 holoviews==1.14.9 html5lib==1.0.1 httpimport==0.5.18 httplib2==0.17.4 httplib2shim==0.0.3 huggingface-hub==0.5.1 humanize==0.5.1 hyperopt==0.1.2 ideep4py==2.0.0.post3 idna==2.10 imageio==2.4.1 imagesize==1.3.0 imbalanced-learn==0.8.1 imblearn==0.0 imgaug==0.2.9 importlib-metadata==4.11.4 importlib-resources==5.7.1 imutils==0.5.4 inflect==2.1.0 iniconfig==1.1.1 intel-openmp==2022.1.0 intervaltree==2.1.0 ipykernel==4.10.1 ipython==5.5.0 ipython-genutils==0.2.0 ipython-sql==0.3.9 ipywidgets==7.7.0 iso-639==0.4.5 itsdangerous==1.1.0 jaraco.classes==3.2.1 jaraco.collections==3.5.1 jaraco.context==4.1.1 jaraco.functools==3.5.0 jaraco.text==3.8.0 jax==0.3.8 jaxlib @ https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.7+cuda11.cudnn805-cp37-none-manylinux2014_x86_64.whl jedi==0.18.1 jieba==0.42.1 Jinja2==2.11.3 jmespath==1.0.0 joblib==1.1.0 jpeg4py==0.1.4 jsonnet==0.18.0 jsonpickle==2.2.0 jsonschema==4.3.3 jupyter==1.0.0 jupyter-client==5.3.5 jupyter-console==5.2.0 jupyter-core==4.10.0 jupyterlab-pygments==0.2.2 jupyterlab-widgets==1.1.0 kaggle==1.5.12 kapre==0.3.7 keras==2.8.0 Keras-Preprocessing==1.1.2 keras-vis==0.4.1 kiwisolver==1.4.2 korean-lunar-calendar==0.2.1 libclang==14.0.1 librosa==0.8.1 lightgbm==2.2.3 llvmlite==0.34.0 lmdb==0.99 LunarCalendar==0.0.9 lxml==4.2.6 Markdown==3.3.7 MarkupSafe==2.0.1 matplotlib==3.2.2 matplotlib-inline==0.1.3 matplotlib-venn==0.11.7 missingno==0.5.1 mistune==0.8.4 mizani==0.6.0 mkl==2019.0 mlxtend==0.14.0 more-itertools==8.13.0 moviepy==0.2.3.5 mpmath==1.2.1 msgpack==1.0.3 multidict==6.0.2 multiprocess==0.70.12.2 multitasking==0.0.10 munch==2.5.0 murmurhash==1.0.7 music21==5.5.0 natsort==5.5.0 nbclient==0.6.4 nbconvert==5.6.1 nbformat==5.4.0 nest-asyncio==1.5.5 netCDF4==1.5.8 networkx==2.6.3 nibabel==3.0.2 nltk==3.7 notebook==5.3.1 numba==0.51.2 numexpr==2.8.1 numpy==1.21.6 nvidia-ml-py3==7.352.0 oauth2client==4.1.3 oauthlib==3.2.0 okgrade==0.4.3 opencv-contrib-python==4.1.2.30 opencv-python==4.1.2.30 openpyxl==3.0.10 opt-einsum==3.3.0 osqp==0.6.2.post0 overrides==3.1.0 packaging==21.3 palettable==3.3.0 pandas==1.3.5 pandas-datareader==0.9.0 pandas-gbq==0.13.3 pandas-profiling==1.4.1 pandocfilters==1.5.0 panel==0.12.1 param==1.12.1 parso==0.8.3 pathlib==1.0.1 pathtools==0.1.2 patsy==0.5.2 patternfork-nosql==3.6 pdfminer.six==20220524 pep517==0.12.0 pexpect==4.8.0 pickleshare==0.7.5 Pillow==7.1.2 pip-tools==6.2.0 plac==1.1.3 plotly==5.5.0 plotnine==0.6.0 pluggy==0.7.1 pooch==1.6.0 portend==3.1.0 portpicker==1.3.9 prefetch-generator==1.0.1 preshed==3.0.6 prettytable==3.3.0 progressbar2==3.38.0 prometheus-client==0.14.1 promise==2.3 prompt-toolkit==1.0.18 protobuf==3.17.3 psutil==5.4.8 psycopg2==2.7.6.1 ptyprocess==0.7.0 py==1.11.0 py-rouge==1.1 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycocotools==2.0.4 pyconll==3.1.0 pycparser==2.21 pyct==0.4.8 pydata-google-auth==1.4.0 pydot==1.3.0 pydot-ng==2.0.0 pydotplus==2.0.2 PyDrive==1.3.1 pyemd==0.5.1 pyerfa==2.0.0.1 pyglet==1.5.0 Pygments==2.6.1 pygobject==3.26.1 pymc3==3.11.4 PyMeeus==0.5.11 pymongo==4.1.1 pymystem3==0.2.0 PyOpenGL==3.1.6 pyparsing==3.0.9 pyrsistent==0.18.1 pysndfile==1.3.8 PySocks==1.7.1 pystan==2.19.1.1 pytest==3.6.4 python-apt==0.0.0 python-chess==0.23.11 python-dateutil==2.8.2 python-docx==0.8.11 python-louvain==0.16 python-slugify==6.1.2 python-utils==3.2.3 pytz==2022.1 pyviz-comms==2.2.0 PyWavelets==1.3.0 PyYAML==3.13 pyzmq==23.0.0 qdldl==0.1.5.post2 qtconsole==5.3.0 QtPy==2.1.0 regex==2022.6.2 requests==2.23.0 requests-oauthlib==1.3.1 resampy==0.2.2 responses==0.18.0 rpy2==3.4.5 rsa==4.8 s3transfer==0.6.0 sacremoses==0.0.53 scikit-image==0.18.3 scikit-learn==1.0.2 scipy==1.4.1 screen-resolution-extra==0.0.0 scs==3.2.0 seaborn==0.11.2 semver==2.13.0 Send2Trash==1.8.0 sentencepiece==0.1.96 sentry-sdk==1.5.12 setproctitle==1.2.3 setuptools-git==1.2 sgmllib3k==1.0.0 Shapely==1.8.2 shortuuid==1.0.9 simplegeneric==0.8.1 six==1.15.0 sklearn==0.0 sklearn-pandas==1.8.0 smart-open==6.0.0 smmap==5.0.0 snowballstemmer==2.2.0 sortedcontainers==2.4.0 SoundFile==0.10.3.post1 soupsieve==2.3.2.post1 spacy==2.2.4 Sphinx==1.8.6 sphinxcontrib-serializinghtml==1.1.5 sphinxcontrib-websupport==1.2.4 SQLAlchemy==1.4.36 sqlparse==0.4.2 srsly==1.0.5 statsmodels==0.10.2 sympy==1.7.1 tables==3.7.0 tabulate==0.8.9 tblib==1.7.0 tempora==5.0.1 tenacity==8.0.1 tensorboard==2.8.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorboardX==2.5.1 tensorflow==2.8.2+zzzcolab20220527125636 tensorflow-datasets==4.0.1 tensorflow-estimator==2.8.0 tensorflow-gcs-config==2.8.0 tensorflow-hub==0.12.0 tensorflow-io-gcs-filesystem==0.26.0 tensorflow-metadata==1.8.0 tensorflow-probability==0.16.0 termcolor==1.1.0 terminado==0.13.3 testpath==0.6.0 text-unidecode==1.3 textblob==0.15.3 Theano-PyMC==1.1.2 thinc==7.4.0 threadpoolctl==3.1.0 tifffile==2021.11.2 tinycss2==1.1.1 tokenizers==0.10.3 tomli==2.0.1 toolz==0.11.2 torch==1.10.2 torchaudio @ https://download.pytorch.org/whl/cu113/torchaudio-0.11.0%2Bcu113-cp37-cp37m-linux_x86_64.whl torchsummary==1.5.1 torchtext==0.12.0 torchvision==0.11.3 tornado==5.1.1 tqdm==4.64.0 traitlets==5.1.1 transformers==4.3.3 tweepy==3.10.0 typeguard==2.7.1 typer==0.4.1 typing-extensions==4.2.0 tzlocal==1.5.1 uritemplate==3.0.1 urllib3==1.25.11 vega-datasets==0.9.0 wandb==0.12.18 wasabi==0.9.1 wcwidth==0.2.5 webencodings==0.5.1 Werkzeug==1.0.1 widgetsnbextension==3.6.0 word2number==1.1 wordcloud==1.5.0 wrapt==1.14.1 xarray==0.20.2 xarray-einstats==0.2.2 xgboost==0.90 xkit==0.0.0 xlrd==1.1.0 xlwt==1.3.0 xxhash==3.0.0 yarl==1.7.2 yellowbrick==1.4 zc.lockfile==2.0 zict==2.2.0 zipp==3.8.0 ```
Steps to reproduce
Example source:
Populate a training data file and validation file with the above conll example. Run the below command using the above configuration file. ``` allennlp train /config_path/srl_train_1.jsonnet -s /model_output ```