allenai / S2AND

Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite
Other
87 stars 19 forks source link

Running pre-trained model #42

Open ArtemiyFirsov opened 1 year ago

ArtemiyFirsov commented 1 year ago

Hi!

I have just the author names data, and I want to just run this data against the pre-trained model

Is it something this module allows?

sergeyf commented 11 months ago

Hello,

First, you have to install the branch that is tied to the published paper: https://github.com/allenai/S2AND/tree/s2and_paper because the main is being constantly updated for our production use-case at semantic scholar.

Then, download the data/model as the readme describes: aws s3 sync --no-sign-request s3://ai2-s2-research-public/s2and-release data/

Then, YOUR data has to be in the appropriate format: signatures, papers, specter embeddings. An example using the included downloaded datasets is here: https://github.com/allenai/S2AND/tree/s2and_paper#how-to-use-s2and-for-loading-data-and-training-a-model

Get familiar with the data format and then format your own data in a similar way. You may need to get SPECTERv1 embeddings for your academic titles + abstracts from our SPECTER API (it's free).

Finally, read this section of the readme carefully: https://github.com/allenai/S2AND/tree/s2and_paper#how-to-use-s2and-for-predicting-with-a-saved-model as it tells you how to get predictions using (a) the model that you download on (b) the data that is formatted properly with all of its various components.

It's all kind of complex, sorry!