Open serenalotreck opened 7 months ago
Hi, thanks for your question.
We have updated the codebase, so please update your git clone and follow the instructions in the updated READ.ME file. see here.
Basically, First, you need to run get_model.sh script to download the original model and also the best model weights into your a folder on your cluster/gpu-machine, and then provide it in --model_folder_path
argument.
Second, the large-scale relation extraction program does not have an NER component inside, i.e., it will not work on pure texts. You need to run an NER system to detect Protein entities inside the texts, and once you do that then you can run the relation extraction system. (For example see here).
Third, the --input_folder_path
that you provide, should have specific format. Inside, there can be different sub-folders, in each one or more .tar.gz file, each containing documents in BRAT standoff format (each document includes a .txt file for text, and one .ann file for the Protein entities inside). Check here.
If you follow aforementioned steps and still see problems please let us know, Thanks!
I'd like to apply your pretrained model to a set of
.txt
files I have. I've cloned the repo and downloaded the model using the instructions in theLargeScaleRelationExtractionPipeline
README.However, it's unclear to me how I would go about running the model on unseen data. I tried just running the script with the following:
where
/mnt/scratch/lotrecks/drought_and_des_1000_subset_15Apr2024/
is a folder of.txt
files. However, I'm getting a model path error that looks like its' because there's a hardcoded path somewhere, and am having trouble determining where it's coming from:Haven't found a hardcoded path in any of
large_scale_prediction_pipeline_tf.py
,ComplexTome_configs.json
, orrun_ls_pipeline.py
.Pointers appreciated!