Problems when training on wikiQA

jc-ryan commented 3 years ago

sorry for disturbing, when I training RE2 on wikiQA，a bug occured: Traceback (most recent call last): File "train.py", line 48, in <module> main() File "train.py", line 31, in main states = trainer.train() File "/home/ryan/code/RE2/src/trainer.py", line 58, in train score, dev_stats = model.evaluate(dev_batches) File "/home/ryan/code/RE2/src/model.py", line 116, in evaluate stats.update(metrics[metric](outputs)) File "/home/ryan/code/RE2/src/utils/metrics.py", line 80, in ranking map_, mrr = [float(s[-6:]) for s in stdout.strip().split('\n')] File "/home/ryan/code/RE2/src/utils/metrics.py", line 80, in <listcomp> map_, mrr = [float(s[-6:]) for s in stdout.strip().split('\n')] ValueError: could not convert string to float: did you find this error before？if so, could you give me any clue to figure it out？thanks a lot

hitvoice commented 3 years ago

I didn't get this error before. Did you make any change to the code?

jc-ryan commented 3 years ago

I didn't get this error before. Did you make any change to the code?

no, I followed the instruction to run this code

hitvoice commented 3 years ago

What is your pytorch version?

winston52 commented 3 years ago

I met the same problem and I am sure I followed the instruction for I had trained model over qqp dataset successfully.

I installed pytorch1.0.1/CUDA 9/python 3.6 and still don't know how to fix this bug, maybe there is something wrong in stdout which contained characters that can not be converted to float number during process of calculating map/mrr.

Here is the error information, thanks a lot for your response!

10/23/2020 10:22:29 train (20360) | test (6165) 10/23/2020 10:22:30 setup complete: 0:00:04s. 10/23/2020 10:22:30 Epoch: 1 THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument 10/23/2020 10:22:31 > epoch 1 updates 10 loss: 0.2900 lr: 1.0000e-03 gnorm: 0.3214 Traceback (most recent call last): File "train.py", line 48, in main() File "train.py", line 31, in main states = trainer.train() File "/home/xxx/RE2/src/trainer.py", line 58, in train score, dev_stats = model.evaluate(devbatches) File "/home/xxx/RE2/src/model.py", line 116, in evaluate stats.update(metricsmetric) File "/home/xxx/RE2/src/utils/metrics.py", line 80, in ranking map, mrr = [float(s[-6:]) for s in stdout.strip().split('\n')] File "/home/xxx/RE2/src/utils/metrics.py", line 80, in map_, mrr = [float(s[-6:]) for s in stdout.strip().split('\n')] ValueError: could not convert string to float:

hitvoice commented 3 years ago

@WinstonHuang96 Sorry for the late reply. I cannot reproduce this error if all the requirements are met.

Did you install the evaluation script? See the WikiQA section in ReadMe. You should install the WikiQA evaluation script first. I tried to remove resources/trec_eval and got the same error.

winston52 commented 3 years ago

Sorry, I forgot to install the evaluation script...I mistakenly assumed that all datasets are preprocessed in the same way.

The problem has been solved. Thanks a lot for your patient reply!

hitvoice commented 3 years ago

Glad to see it solved 😄

alibaba-edu / simple-effective-text-matching-pytorch

Problems when training on wikiQA #6