RNAcentral / rnacentral-references

0 stars 0 forks source link

Annotate papers with source organism #10

Open blakesweeney opened 2 years ago

blakesweeney commented 2 years ago

We can mark all papers by what organism the paper is about. This will be key to integrating this into miRBase as they use this in their current approach (https://academic.oup.com/nar/article/47/D1/D155/5179337?login=false). This is needed because some miRNA names are not specific to a species. This means simple text mining may well match something from a different species than the intended species. To get around this they use the organism database (https://organisms.jensenlab.org/Search) which contains a mapping of taxid -> paper. We should import and annotate this. This would then be provided as a facet in the widget as well.

carlosribas commented 1 year ago

I started this task and at least for now I am loading the data from the csv file into a table in the database (_litscan_loadorganism). That done, for the articles that LitScan found that have pmid, I check which organisms were annotated. The results are saved in the _litscanorganism table.

I'm going to review this process and think about the best way to automate it, but I imagine it will be something similar to what we already do with the rest of the RNAcentral data.