google-research / tapas

End-to-end neural table-text understanding models.
Apache License 2.0
1.15k stars 217 forks source link

Different result in colab vs locally run jupyter notebook (IndexError: list index out of range in local) #125

Open sbhttchryy opened 3 years ago

sbhttchryy commented 3 years ago

Hello, I am facing a rather strange issue. I was running the wtq_predictions in Colab, where it worked well. When I am trying to reproduce the same result by running the notebook in my local machine, the answers are vehemently different.

For colab: image

When I am trying to run the same code, in my Jupyter notebook, the result is as follows: image

I get the following error:

---> 1 my_result = predict (updated_table,['What is the maximum sofa?', 'What is the icu_readm corresponding to the highest sofa?', 'What is the icu_readm corresponding to the lowest sofa?'] )

in predict(table, queries) 69 print ('coordinates is: ', coordinates) 70 all_coordinates.append(coordinates) ---> 71 answers = ', '.join([table[row + 1][col] for row, col in coordinates]) 72 position = int(row['position']) 73 aggregation = aggregation_to_string(int(row["pred_aggr"])) in (.0) 69 print ('coordinates is: ', coordinates) 70 all_coordinates.append(coordinates) ---> 71 answers = ', '.join([table[row + 1][col] for row, col in coordinates]) 72 position = int(row['position']) 73 aggregation = aggregation_to_string(int(row["pred_aggr"])) IndexError: list index out of range As you can see, I printed out the row and column co-ordinates to check and they are wildly different in both cases. There is a python version mismatch in Colab and my jupyter kernel. The colab runs with Python 3.7.10 and my kernel Python 3.6.13. I am running this is Ubuntu 16.04. Any tips will be most appreciated. Thank you.
sbhttchryy commented 3 years ago

Hello @eisenjulian and @muelletm, any update on this? Thank you.

eisenjulian commented 3 years ago

Hello @sbhttchryy , Sorry for the delay. One thing confusing me besides the inconsistent predictions is that in theory the model should never output a prediction coordinate that is out of bounds to the table. This is due to the logic here: https://github.com/google-research/tapas/blob/master/tapas/experiments/prediction_utils.py#L304-L305 that only gets the score for valid cells in the table. Give that, I have the following ideas/suggestions to try:

  1. Perhaps the prediction is not actually running due to some other error and you are reading the predictions of an old output file: results/wtq/model/test.tsv try deleting the file before re-running
  2. Check for a file named error in the execution path of the notebook, since it will contain any errors in the prediction loop
  3. If you can, also try running locally from python 3.7 or 3.8 environment to validate if that's the issue
  4. Confirm whether you are running locally with an installation from pypi or from the latest github clone (with pip install -e). If it's the latter, you could try adding some logs in prediction_utils in the lines I references to check the size of the input table there

Thanks a lot for the report!