allenai / SciREX

Data/Code Repository for https://api.semanticscholar.org/CorpusID:218470122
Apache License 2.0
129 stars 30 forks source link

Problem generating predictions #12

Closed Yichabod closed 4 years ago

Yichabod commented 4 years ago

Hi, Thank for resolving the previous issues, but I'm still having some problems generating predictions despite following the exact instructions in the ReadME.

After successfully installing SciREX, I followed the 4 training instructions:

    Extract the dataset files in folder tar -xvzf scirex_dataset/release_data.tar.gz --directory scirex_dataset
    Export path to scibert export BERT_BASE_FOLDER=<path-to-scibert> . This path should contain two files atleast - vocab.txt and weights.tar.gz. Download the file here https://s3-us-west-2.amazonaws.com/ai2-s2-research/scibert/pytorch_models/scibert_scivocab_uncased.tar and untar it.
    Run CUDA_DEVICE=<cuda-device-num> bash scirex/commands/train_scirex_model.sh main to train main scirex model
    Run CUDA_DEVICE=<cuda-device-num> bash scirex/commands/train_pairwise_coreference.sh main to train secondary coreference model.

However, when I tried to follow the instructions to generate the predictions

scirex_archive=outputs/pwc_outputs/experiment_scirex_full/main \
scirex_coreference_archive=outputs/pwc_outputs/experiment_pairwise_coreference/main \
cuda_device=<cuda-device-num> \
bash scirex/commands/predict_scirex_model.sh

I got an error saying that the pwc_outputs/experiment_pairwise_coreference/main doesn't exist. I check the directory and find that instead of experiment_pairwise_coreference, there is experiment_coreference, so I change the second line of the prediction instruction to scirex_coreference_archive=outputs/pwc_outputs/experiment_coreference/main and I get the following error:

ModuleNotFoundError: No module named 'scirex'
Predicting Salient Clustering
Traceback (most recent call last):
  File "scirex/predictors/predict_salient_clusters.py", line 25, in <module>
    predict(sys.argv[1], sys.argv[2], sys.argv[3])
  File "scirex/predictors/predict_salient_clusters.py", line 5, in predict
    clusters = [json.loads(line) for line in open(clusters_file)]
FileNotFoundError: [Errno 2] No such file or directory: 'test_outputs//cluster_predictions.jsonl'
Predicitng Salient Clusters using gold clusters as filter
Traceback (most recent call last):
  File "scirex/predictors/predict_salient_clusters_using_gold.py", line 11, in <module>
    from scirex.predictors.utils import *
ModuleNotFoundError: No module named 'scirex'
Predicting Relations End-to-End
Traceback (most recent call last):
  File "scirex/predictors/predict_n_ary_relations.py", line 16, in <module>
    from scirex.predictors.utils import merge_method_subrelations
ModuleNotFoundError: No module named 'scirex'
Predicting relations End-to-End with gold cluster filtering
Traceback (most recent call last):
  File "scirex/predictors/predict_n_ary_relations.py", line 16, in <module>
    from scirex.predictors.utils import merge_method_subrelations
ModuleNotFoundError: No module named 'scirex'
Predicting Relations on gold clusters
Traceback (most recent call last):
  File "scirex/predictors/predict_n_ary_relations.py", line 16, in <module>
    from scirex.predictors.utils import merge_method_subrelations
ModuleNotFoundError: No module named 'scirex'
Evaluating on all Predicted steps
Traceback (most recent call last):
  File "scirex/evaluation_scripts/scirex_relation_evaluate.py", line 7, in <module>
    from scirex.metrics.clustering_metrics import match_predicted_clusters_to_gold
ModuleNotFoundError: No module named 'scirex'
Evaluating on all predicted steps with filtering using gold salient clusters
Traceback (most recent call last):
  File "scirex/evaluation_scripts/scirex_relation_evaluate.py", line 7, in <module>
    from scirex.metrics.clustering_metrics import match_predicted_clusters_to_gold
ModuleNotFoundError: No module named 'scirex'

Any idea how I can fix this?

viswavi commented 4 years ago

I also had this problem. My solution was to update PYTHONPATH with the root of the SciREX repo. This way, scripts that are run from subdirectories can still access packages defined at the top-level of SciREX.

e.g. export PYTHONPATH=$PYTHONPATH:<path to SciREX/ root>

successar commented 4 years ago

Hi both, Thanks for bringing this to my attention. I will add it to the README. In general, you can set the PYTHONPATH. I normally run all my code from scirex root directory so I never got this error

@Yichabod Please let me know if setting PYTHONPATH works for you

viswavi commented 4 years ago

Pretty sure that running the code from the SciREX root directory does not solve this problem.

I ran

scirex_archive=outputs/pwc_outputs/experiment_scirex_full/main \
scirex_coreference_archive=outputs/pwc_outputs/experiment_coreference/main \
cuda_device=<cuda-device-num> \
bash scirex/commands/predict_scirex_model.sh

(as written in the README), from the root of the SciREX directory, and still ran into this problem

Yichabod commented 4 years ago

@successar my outputs repository seemed to be missing some information so I am retraining the models and then am going to see if modifying the python path works.

Yichabod commented 4 years ago

@successar Now when I try to run the prediction, it doesn't generate anything. Instead, I get the following errors: After a long series of info messages:

  Predicting NER
  instances = [instance for instance in Tqdm.tqdm(instances)]
  File "/opt/conda/envs/scirex/lib/python3.7/site-packages/allennlp/data/dataset_readers/dataset_reader.py", line 134, in <listcomp>
    instances = [instance for instance in Tqdm.tqdm(instances)]
  File "/opt/conda/envs/scirex/lib/python3.7/site-packages/tqdm/_tqdm.py", line 1005, in __iter__
    for obj in iterable:
  File "/home/ml1955/task-and-method-annotation/SciREX/scirex/data/dataset_readers/scirex_full_reader.py", line 141, in _read
    with open(file_path, "r") as g:
FileNotFoundError: [Errno 2] No such file or directory: 'test_outputs//ner_predictions.jsonl'
...
 File "scirex/predictors/predict_clusters.py", line 22, in predict
    documents = [json.loads(line) for line in tqdm.tqdm(open(coreference_scores_file))]
FileNotFoundError: [Errno 2] No such file or directory: 'test_outputs//coreference_predictions.jsonl'
...
Predicting Relations on gold clusters
Traceback (most recent call last):
  File "scirex/predictors/predict_n_ary_relations.py", line 109, in <module>
    predict(argv[1], argv[2], argv[3], argv[4], int(argv[5]))
  File "scirex/predictors/predict_n_ary_relations.py", line 48, in predict
    relation_threshold = json.load(open(archive_folder + '/metrics.json'))['best_validation__n_ary_rel_global_threshold']
FileNotFoundError: [Errno 2] No such file or directory: 'outputs/pwc_outputs/experiment_scirex_full/everything//metrics.json'

Note that I was unable to train the coreference model (it would freeze after the info messages, even when I changed the batch_size in the libsonnet file to 5 so that it would run on my GPU). However, I don't think the coreference model is needed for NER. (I only care about getting the NER part working).

Any ideas?

successar commented 4 years ago

Hi

Is this the full error message ? Is the problem occurring after the first command in this file ? https://github.com/allenai/SciREX/blob/master/scirex/commands/predict_scirex_model.sh

Can you try creating a directory called "test_outputs" before running the prediction command ?

successar commented 4 years ago

Can you confirm problem is occurring before line 10 in above file ?

Yichabod commented 4 years ago

OK I created a test_output directory and it seemed to work