Raldir / FEVEROUS

Repository for Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), accepted to NeurIPS 2021 Dataset and Benchmarks and used for the FEVER Workshop Shared Task at EMNLP2021.
Apache License 2.0
67 stars 20 forks source link

TypeError while running baseline in build_tfidf.py #29

Closed rayhuang11 closed 11 months ago

rayhuang11 commented 11 months ago

Hi there @Raldir! I'm looking for help on running the baseline model.

I'm a researcher in economics at MIT, and we're looking at AI-human collaboration in a fact-checking setting. I'm not a computer scientist by training, so apologies in advance for any basic / confusing questions.

When running the baseline in the build TF-IDF index section, I get the following error:

[INFO] 2023-10-29 11:19:00,668 - DrQA BuildDB - Reading into database...
10/29/2023 11:19:00 AM: [ Reading into database... ]
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 5421406/5421406 [38:30<00:00, 2346.26it/s]
[INFO] 2023-10-29 11:57:37,906 - DrQA BuildDB - Read 5421406 docs.
10/29/2023 11:57:37 AM: [ Read 5421406 docs. ]
[INFO] 2023-10-29 11:57:37,907 - DrQA BuildDB - Committing...
10/29/2023 11:57:37 AM: [ Committing... ]
10/29/2023 11:57:37 AM: [ Counting words... ]
10/29/2023 11:57:47 AM: [ Mapping... ]
10/29/2023 11:57:47 AM: [ -------------------------Batch 1/11------------------------- ]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/rayhuang/anaconda3/envs/feverous/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/rayhuang/anaconda3/envs/feverous/lib/python3.8/site-packages/feverous/baseline/drqascripts/build_tfidf.py", line 78, in count
    col.extend([DOC2IDX[doc_id]] * len(counts))
TypeError: 'NoneType' object is not subscriptable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "examples/baseline.py", line 56, in <module>
    build_tfidf(db_path=args.tf_idf_db_path, out_dir=args.tf_idf_index_path)
  File "/Users/rayhuang/anaconda3/envs/feverous/lib/python3.8/site-packages/feverous/baseline/retriever/build_tfidf.py", line 26, in build_tfidf
    count_matrix, doc_dict = get_count_matrix("sqlite", {"db_path": db_path}, ngram, hash_size, tokenizer, num_workers)
  File "/Users/rayhuang/anaconda3/envs/feverous/lib/python3.8/site-packages/feverous/baseline/drqascripts/build_tfidf.py", line 107, in get_count_matrix
    for b_row, b_col, b_data in workers.imap_unordered(_count, batch):
  File "/Users/rayhuang/anaconda3/envs/feverous/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
TypeError: 'NoneType' object is not subscriptable

In build_tfidf.py when I print the dictionary, I get DOC2IDX: None.

Any guidance here would be greatly appreciated! Thanks for your time.

Raldir commented 11 months ago

Hi @rayhuang11. Could you let me know where you are printing out the dictionary? The error is indeed a a bit odd. Are you running the code from source or via pip? What exact command are you running?

Could you also print out the dict after this line:https://github.com/Raldir/FEVEROUS/blob/a0eb8850734981778cf57d32764409b5955de13a/src/feverous/baseline/drqascripts/build_tfidf.py#L93

Thanks!

rayhuang11 commented 11 months ago

@Raldir thanks so much for the quick response!

I added the print statement you specified, and I'm still getting a dictionary where all the keys are just DOC2IDX and the values are all None. Before the code fails, it does output the feverous-wiki-docs.db in data.

I set up my environment as specified in the README. I set up a conda environment in Python 3.8 with these commands:

conda create -n feverous python=3.8
conda activate feverous
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 -c pytorch

I'm running the following command from the baseline instructions in my conda environment: python3 examples/baseline.py --split dev --doc_count 5 --sent_count 5 --tab_count 3 --config_path_cell_retriever src/feverous/baseline/retriever/config_roberta.json --config_path_verdict_predictor src/feverous/baseline/predictor/config_roberta_old.json

I'm not sure if this is related to this issue. When I install the requirements via requirements.txt, all the packages install except for certifi @ file:///croot/certifi_1671487769961/work/certifi and en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl -e git+ssh://git@github.com/Raldir/FEVEROUS.git@4104d6f5bd1cdd69992db87a16d5a430e30ea4df#egg=feverous, where I get the following errors respectively: 1) No such file or directory: '/croot/certifi_1671487769961/work/certifi' and 2) feverous from git+ssh://****@github.com/Raldir/FEVEROUS.git@4104d6f5bd1cdd69992db87a16d5a430e30ea4df#egg=feverous (from -r /Users/rayhuang/Documents/Blueprint_Labs/FEVEROUS/src/feverous/requirements.txt (line 17)) does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.

Thanks again! I'm going to continue troubleshooting on my end and will comment if something new comes up

Raldir commented 11 months ago

Thanks. the requirements.txt is installing an older version of the repository that does not contain a current bug in the code, so I did not notice it (besides the bug that the requirements.txt is having a version of the repo itself in its dependency). I will fix this as soon as possible. In the meantime, I have updated the installation instruction in the README to install all packages via pip, which I was planning on doing anyways. Could you create a new environment and follow the process as described in the updated README?

I'll fix the error in build_tfidf.py as soon as possible, the refactoring broke arguments call for various functions. Once I've done that, you can update the repo to the latest version.

rayhuang11 commented 11 months ago

@Raldir thanks so much! I've followed the pip instructions, and will update the repo once you're finished.

I download the data via the shell script, but it might be helpful for you to know in the Reading Data section of the readme, when I run the following line I get a sql error:

>>> page_json = db.get_doc_json("Anarchism")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rayhuang/anaconda3/envs/feverous/lib/python3.8/site-packages/feverous/database/feverous_db.py", line 50, in get_doc_json
    cursor.execute(
sqlite3.OperationalError: no such table: wiki

On an unrelated note, I had a quick question about modifying the output of the model to produce a continuous output (e.g. "probability of being true") instead of a binary prediction. These don't have to be probabilities by definition, but anything akin to a continuous "score". From my understanding of the code, I can minimally modify the claim_evidence_predictor function and simplify adjust how I store results. Does this seem reasonable? Let me know if I should open a new issue for this question. Thank you!

Raldir commented 11 months ago

Which .db file are you loading when calling page_json = db.get_doc_json("Anarchism")? The feverous-wiki-docs.db should not be used here, but I see that it can be confusing. Instead, to follow the README, you want to use the FEVEROUS DB file (i.e. this, i.e. feverous-wiki-pages.db). However, please note that the name of the DB file when downloaded does not align with default name used in examples.baseline.py, which is feverous_wikiv1.db. Downloading the data via the provided script download_data.sh takes care of this.

Thanks for pointing out current hurdles for getting started with the dataset/baseline. I will remove the get_doc_json method from doc_db.python, as it is an artifact, and adjust the default naming in baseline.py to avoid these potential issues.

Regarding your second point, what should the continuous output represent? A score indicating whether a statement is supported or refuted by evidence? FEVEROUS is a ternary classification task, so using the normalized logits, e.g. by applying a softmax operator after the line below might give you an estimate of the model's confidence for each class: https://github.com/Raldir/FEVEROUS/blob/a0eb8850734981778cf57d32764409b5955de13a/src/feverous/baseline/predictor/evaluate_verdict_predictor.py#L68 However, note that these scores are likely very poorly calibrated (e.g. overconfidence for many predictions), a common issue with these models, so one should probably not rely on them for interpretability purposes.

If you want a single continuous value as the output, you can consider mapping NEI and Refuted instances to a single "not supported" class, and adjust the model to use a single output logit. HF automatically then considers the task automatically a regression task, see: https://github.com/huggingface/transformers/blob/552ff24488d4027590deded3b2b0d1716df341c3/src/transformers/models/roberta/modeling_roberta.py#L1217

Hope that helps a bit!

Raldir commented 11 months ago

Hi @rayhuang11, the dependencies are fixed now and the README has been updated. Can you give the baseline another go after setting up a new environment and ensuring that the naming of the wiki file match with the argument used in examples/baseline.py?

rayhuang11 commented 11 months ago

@Raldir this is great, thanks so much your time.

The dependencies are now working. Also, all the code in reading annotations and reading wikipedia are working as expected.

I'm currently in the process of running it. I found some typos in the readme and code, which I'll submit corrections of later in another issue. I'll keep you posted on my progress.

Thanks again for your help!

Raldir commented 11 months ago

You're welcome. Sure, feel free to open a PR or another issue, much appreciated thanks!