Closed RichJackson closed 3 years ago
The code seems to break because the model gets an accuracy of 0.0 at end of 1st epoch. I will push the corrected code to avoid this. But I am concerned that about 0% accuracy at the first place. Are you using your custom data?
Hi there. Thanks for helping me out!
I'm trying to reproduce the models as described in the original paper, so am following the instructions in the readme, and therefore using the original data.
I'm running with the command:
python run.py --save models/warmup_oie_model --mode train_test --model_str bert-base-cased --task oie --epochs 30 --gpus 1 --batch_size 24 --optimizer adamW --lr 2e-05 --iterative_layers 2
Note, I had to make a few fixes to the requirements.txt in order to get the code to run. My current environment looks like:
absl-py==0.9.0
aiohttp==3.7.4.post0
alabaster==0.7.12
allennlp===0.9.0-unreleased
astroid==1.6.6
async-timeout==3.0.1
attrs==21.2.0
Babel==2.9.1
backcall==0.2.0
bleach==3.3.0
blis==0.4.1
boto3==1.10.45
botocore==1.13.45
cached-property==1.5.2
cachetools==4.1.1
catalogue==1.0.0
certifi==2020.6.20
cffi==1.14.5
chardet==3.0.4
click==7.1.2
codecov==2.1.11
colorama==0.4.4
conllu==1.3.1
coverage==5.5
cryptography==3.4.7
cycler==0.10.0
cymem==2.0.3
decorator==4.4.2
docopt==0.6.2
docutils==0.15.2
editdistance==0.5.3
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.0/en_core_web_sm-2.3.0.tar.gz
filelock==3.0.12
flaky==3.6.1
Flask==1.1.1
Flask-Cors==3.0.8
ftfy==5.6
future==0.18.2
gevent==1.4.0
google-auth==1.18.0
google-auth-oauthlib==0.4.1
greenlet==1.1.0
grpcio==1.30.0
h5py==2.10.0
idna==2.10
imageio==2.8.0
imagesize==1.2.0
importlib-metadata==1.7.0
ipdb==0.13.9
ipython==7.16.1
ipython-genutils==0.2.0
isort==5.8.0
itsdangerous==2.0.1
jedi==0.17.1
jeepney==0.6.0
Jinja2==3.0.1
jmespath==0.10.0
joblib==0.15.1
jsonnet @ file:///home/conda/feedstock_root/build_artifacts/jsonnet_1606064680848/work
jsonpickle==1.2
keyring==23.0.1
kiwisolver==1.3.1
lazy-object-proxy==1.6.0
livereload==2.6.3
Markdown==3.2.2
MarkupSafe==2.0.1
matplotlib==3.1.2
matplotlib-inline==0.1.2
mccabe==0.6.1
more-itertools==8.8.0
multidict==5.1.0
murmurhash==1.0.2
mypy==0.521
nltk==3.5
numpy==1.19.0
numpydoc==0.9.2
oauthlib==3.1.0
overrides==3.1.0
packaging==20.4
pandas==1.0.5
parsimonious==0.8.1
parso==0.7.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.1.2
pkginfo==1.7.0
plac==1.1.3
pluggy==0.13.1
preshed==3.0.2
prompt-toolkit==3.0.5
protobuf==3.12.2
ptyprocess==0.6.0
py==1.10.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
Pygments==2.6.1
pylint==1.9.4
pypandoc==1.5
pyparsing==2.4.7
pytest==5.3.2
pytest-cov==2.12.1
python-dateutil==2.8.1
pytorch-lightning==0.7.6
pytorch-pretrained-bert==0.6.2
pytorch-transformers @ file:///home/user/openie6/imojie/pytorch_transformers
pytz==2020.1
PyYAML==5.3.1
readme-renderer==29.0
regex==2020.6.8
requests==2.24.0
requests-oauthlib==1.3.0
requests-toolbelt==0.9.1
responses==0.10.9
rfc3986==1.5.0
rsa==4.6
s3transfer==0.2.1
sacremoses==0.0.43
scikit-learn==0.23.1
scipy==1.5.0
SecretStorage==3.3.1
sentencepiece==0.1.91
six==1.15.0
snowballstemmer==2.1.0
spacy==2.3.0
Sphinx==2.3.1
sphinx-autobuild==2021.3.14
sphinx-rtd-theme==0.5.2
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
sqlparse==0.3.0
srsly==1.0.2
tensorboard==2.2.2
tensorboard-plugin-wit==1.7.0
tensorboardX==1.9
thinc==7.4.1
threadpoolctl==2.1.0
tokenizers==0.5.2
toml==0.10.2
toolz==0.11.1
torch==1.6.0
torchtext==0.7.0
tornado==6.1
tqdm==4.47.0
traitlets==4.3.3
transformers==2.6.0
twine==3.4.1
typed-ast==1.0.4
typing-extensions==3.10.0.0
Unidecode==1.1.1
urllib3==1.25.9
wasabi==0.7.0
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
wget==3.2
word2number==1.1
wrapt==1.12.1
yarl==1.6.3
zenodo-get==1.3.0
zipp==3.1.0
zope.event==4.5.0
zope.interface==5.4.0
I can also confirm that training for longer still results in an eval_f1 of 0:
Validation sanity check: 100%|██████████| 2/2 [00:00<00:00, 2.82it/s]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}
Epoch 1: 100%|██████████| 3827/3827 [11:57<00:00, 5.33it/s, loss=1.060, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 2: 100%|█████████▉| 3826/3827 [12:40<00:00, 5.03it/s, loss=1.088, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 3: 100%|█████████▉| 3826/3827 [12:55<00:00, 4.94it/s, loss=0.817, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 4: 100%|█████████▉| 3826/3827 [13:02<00:00, 4.89it/s, loss=0.832, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 5: 100%|█████████▉| 3826/3827 [13:09<00:00, 4.84it/s, loss=0.641, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 6: 100%|█████████▉| 3826/3827 [13:14<00:00, 4.82it/s, loss=0.651, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 7: 100%|█████████▉| 3826/3827 [13:11<00:00, 4.83it/s, loss=0.536, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 8: 100%|█████████▉| 3826/3827 [13:18<00:00, 4.79it/s, loss=0.480, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 9: 100%|█████████▉| 3826/3827 [13:16<00:00, 4.80it/s, loss=0.393, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 10: 100%|█████████▉| 3826/3827 [13:13<00:00, 4.82it/s, loss=0.438, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 11: 100%|█████████▉| 3826/3827 [13:20<00:00, 4.78it/s, loss=0.388, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 12: 100%|█████████▉| 3826/3827 [13:16<00:00, 4.81it/s, loss=0.369, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 13: 100%|█████████▉| 3826/3827 [13:18<00:00, 4.79it/s, loss=0.295, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 14: 100%|█████████▉| 3826/3827 [13:20<00:00, 4.78it/s, loss=0.319, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 15: 100%|█████████▉| 3826/3827 [13:16<00:00, 4.80it/s, loss=0.256, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 16: 100%|█████████▉| 3826/3827 [13:17<00:00, 4.80it/s, loss=0.263, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 17: 100%|█████████▉| 3826/3827 [13:20<00:00, 4.78it/s, loss=0.241, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 18: 100%|█████████▉| 3826/3827 [13:21<00:00, 4.77it/s, loss=0.185, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 19: 100%|█████████▉| 3826/3827 [13:21<00:00, 4.77it/s, loss=0.184, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 20: 100%|█████████▉| 3826/3827 [13:22<00:00, 4.77it/s, loss=0.230, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 21: 100%|█████████▉| 3826/3827 [13:21<00:00, 4.77it/s, loss=0.225, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 22: 100%|█████████▉| 3826/3827 [13:19<00:00, 4.78it/s, loss=0.218, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 23: 100%|█████████▉| 3826/3827 [13:22<00:00, 4.77it/s, loss=0.147, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 24: 100%|█████████▉| 3826/3827 [13:23<00:00, 4.76it/s, loss=0.217, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 25: 100%|█████████▉| 3826/3827 [13:21<00:00, 4.77it/s, loss=0.162, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 26: 100%|█████████▉| 3826/3827 [13:21<00:00, 4.78it/s, loss=0.143, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 27: 100%|█████████▉| 3826/3827 [13:22<00:00, 4.77it/s, loss=0.141, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 28: 100%|█████████▉| 3826/3827 [13:22<00:00, 4.77it/s, loss=0.114, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 29: 100%|█████████▉| 3826/3827 [13:20<00:00, 4.78it/s, loss=0.133, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 30: 100%|█████████▉| 3826/3827 [13:21<00:00, 4.77it/s, loss=0.120, v_num=train.part]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}/s]
Epoch 30: 100%|██████████| 3827/3827 [13:21<00:00, 4.77it/s, loss=0.120, v_num=train.part]
Testing: 100%|██████████| 27/27 [00:02<00:00, 12.09it/s]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}
--------------------------------------------------------------------------------
TEST RESULTS
{'eval_auc': 0, 'eval_f1': 0, 'eval_lastf1': 0, 'test_acc': 0}
--------------------------------------------------------------------------------
Testing: 100%|██████████| 27/27 [00:02<00:00, 10.44it/s]
Testing: 100%|██████████| 27/27 [00:02<00:00, 12.15it/s]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}
--------------------------------------------------------------------------------
TEST RESULTS
{'eval_auc': 0, 'eval_f1': 0, 'eval_lastf1': 0, 'test_acc': 0}
--------------------------------------------------------------------------------
Testing: 100%|██████████| 27/27 [00:02<00:00, 10.46it/s]
The loss seems to be decreasing as expected, so perhaps there's a problem with the evaluation code? Any help greatly appreciated!
I just ran a quick test with your pretrained model (i.e. just the evaluation part without training)
python run.py --save models/warmup_oie_model --mode test --model_str bert-base-cased --task oie --epochs 30 --gpus 1 --batch_size 24 --optimizer adamW --lr 2e-05 --iterative_layers 2
This also results in an eval_f1 of 0.0. However, running with --mode predict
seems to give the expected results, so this suggests the evaluation code isn't working correctly?
Hello, I have re-run the steps from README in a fresh environment (Installation/Download Resources/Testing warmup model) and I am able to replicate the scores perfectly in test mode. I checked the important libraries in your current environment and they seem to match. What was issue you found with the original requirements.txt? What exactly did you have to change? That may give some insight into this.
Closing this now. Not quite sure what I was doing wrong, but can confirm code is performing as expected now
Hi there. The warmup training seems to fail. Error below