Open PetrosStav opened 1 year ago
Hello, you are right, the file is required to run the pre-trained model. It was and oversight that is was not yet added to the repository, I am glad you caught it.
For now I added the encoding under ./data/encoding.json
. (We will also add the encoding file to Zenodo as soon as possible.)
The corresponding parameter in the configuration file should be general/checkpoint/save_dir
.
I performed a quick test and it should work to directly set "save_dir"="./data/encoding.json"
in the config.
Thank you very much @dave-s477 for the quick response, everything is working now!
Another quick question; in the encoding.json
, as well as in the predictions from the system, I see that you have the software
and soft_type
predictions along with their tags.
The soft_type
and mention_type
seem straightforward, so my question is why is the "B-Application" and "I-Application"
are both in software
and in soft_type
.
My goal here is to map them to the "Software Type", "Mention Type" and "Additional Information"
that are outlined in the SoMeSci paper.
Thank you again for your help! :-)
"software": {
"O": 0,
"B-Application": 1,
"B-Version": 2,
"B-Citation": 3,
"B-Developer": 4,
"I-Developer": 5,
"I-Version": 6,
"B-Release": 7,
"I-Application": 8,
"B-Extension": 9,
"B-Abbreviation": 10,
"B-URL": 11,
"I-Release": 12,
"I-URL": 13,
"B-AlternativeName": 14,
"I-AlternativeName": 15,
"I-Extension": 16,
"I-Citation": 17,
"B-License": 18,
"I-License": 19,
"I-Abbreviation": 20
},
"soft_type": {
"O": 0,
"B-Application": 1,
"B-PlugIn": 2,
"I-Application": 3,
"B-ProgrammingEnvironment": 4,
"I-PlugIn": 5,
"B-OperatingSystem": 6,
"I-OperatingSystem": 7,
"I-ProgrammingEnvironment": 8,
"B-SoftwareCoreference": 9,
"I-SoftwareCoreference": 10
},
"mention_type": {
"O": 0,
"B-Usage": 1,
"I-Usage": 2,
"B-Mention": 3,
"B-Creation": 4,
"I-Creation": 5,
"I-Mention": 6,
"B-Deposition": 7,
"I-Deposition": 8
}
Hello @dave-s477 ,
I'm trying to run the
bin/predict
function as suggested by you in this issue https://github.com/dave-s477/SoMeNLP/issues/4using the pretrained checkpoint provided here: https://zenodo.org/record/7400022/files/M_SB_sw_info_opt.pth?download=1
I have edited the
pred_multi_opt2_SciBERT.json
to include the corrects paths to the checkpoint and the SciBert tokenizer.However when I try to run it I get the following error message:
By taking a look at the code, when a checkpoint is entered then it searches for this
encoding.json
file, which as far as I have searched is not provided.Here is the actual code snippet in
NER/data_hander.py
:and in
NER/output_hander.py
The output in the terminal until that point is:
I there something else I'm not getting right or is it just that this
encoding.json
file is missing? If so, can you please provide it so that I can run the predict function with the pretrained checkpoint?Thanks in advance for your help! :-)