cannin / enhance_nlp_interaction_network_gsoc2020

3 stars 4 forks source link

Extracting MeSH Terms for Articles #1

Open cannin opened 4 years ago

cannin commented 4 years ago

Task

For each MeSH term save 20 articles for comparison. On the results page, there is a Save button, save to PMID format to get the list of article IDs. I think all returned results will have MeSH terms. The XML files also have abstracts that can be put through the MTI tool. You can skip the manual "On Demand" tool for now.

MeSH Terms

Neurosciences: D009488 Neoplasms: D009369 Communicable Diseases: D003141 Aging: D000375 Computational Biology: D019295

Search Query

"neurosciences"[MeSH Terms] AND "journal article"[Publication Type] AND hasabstract

Search Page

https://pubmed.ncbi.nlm.nih.gov/

Search Results

https://pubmed.ncbi.nlm.nih.gov/?term=%22neurosciences%22%5BMeSH+Terms%5D+AND+%22journal+article%22%5BPublication+Type%5D+AND+hasabstract&size=20

Miscellaneous

You can grab the PubMed IDs programmatically if you want. Using the ESearch command, it is part of EFetch and EUtils. Google it, it shouldn't be too hard to do all this with requests and etree in Python.

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed

PritiShaw commented 4 years ago

Hi Mentor Please find my initial work in the link (https://gist.github.com/PritiShaw/4a4410535fb150c1cd9734d5961a0dd5)

cannin commented 4 years ago

Thanks looks interesting! Do you have them ready for the other disease types?

cannin commented 4 years ago

For the first MESH term,

28303019|*Optogenetics|C3494301|613216|MH|RtM via: Optogenetics;Forced Leaf Node Lookup:optogenetics|TI;AB|MM;RC|711^11^0;363^12^0;120^12^0;36^12^0;520^12^0

What is "711^11^0;363^12^0;120^12^0;36^12^0;520^12^0" at the end?

I can't find it here:

https://ii.nlm.nih.gov/resource/MTI_output_help_info.html

Also, is the score field missing?

PritiShaw commented 4 years ago

Also, is the score field missing?

PMID Term CUI Score Type Misc Location Path(s)
28303019 Rod Opsins C0069580 990 MH RtM via: Rod Opsins AB MM
28303019 Brain C0006104 5439 MH RtM via: Brain AB MM;RC
28303019 Opsin C2355587 41470 ET Entry Term Replacement for "Opsins";RtM via: Opsins AB MM;RC

The above table shows a few examples for your reference.

For the first MESH term,

28303019|*Optogenetics|C3494301|613216|MH|RtM via: Optogenetics;Forced Leaf Node Lookup:optogenetics|TI;AB|MM;RC|711^11^0;363^12^0;120^12^0;36^12^0;520^12^0

What is "711^11^0;363^12^0;120^12^0;36^12^0;520^12^0" at the end?

I can't find it here:

https://ii.nlm.nih.gov/resource/MTI_output_help_info.html

I checked in Interactive Medical Text Indexer, if you select Show MTI Explanation Information. in Debug options, then similar numbers appear in the result. I guess it shows the information about all of the recommendations made by MTI. The information here is similar to the Expanded Detail output format.

cannin commented 4 years ago

Thanks this table information for all 100 articles would be good. But you need to do what you did before where you have the rows for the terms from PubMed. In that case, all these additional columns would be NA (missing).

PritiShaw commented 4 years ago

I have compared the mesh terms of other diseases as well. You can find here https://github.com/PritiShaw/Analyze-MESH

cannin commented 4 years ago

@PritiShaw thanks. I'm unsure where the results are. Also, I think one of the files in your list, you did not commit.

PritiShaw commented 4 years ago

Sorry for the mess, actually I was updating the Readme.md file. Links for the outputs are:
Details_of_MESH_term
Comparison_between _MESH

cannin commented 4 years ago

Nice. So the Comparison has the 100 papers? If so, can you add one more column with the topic term (e.g., Neurosciences, Aging, etc)

PritiShaw commented 4 years ago

I have added the Topic Term column in the compare.tsv. You can find it here Comparison_between _MESH

cannin commented 4 years ago

Thanks. It doesn't look like you mapped the two sets of MeSH terms. For example, Humans, Metformin, Aging in the image. Can you fix?

Screen Shot 2020-06-01 at 5 44 51 PM
PritiShaw commented 4 years ago

The present sequence is sorted by score. Hence I did not disturb it. I will make the changes and let you know.

PritiShaw commented 4 years ago

I have made the changes you asked for. You can find it here Comparison_between _MESH

cannin commented 4 years ago

@PritiShaw thanks. Can you try out MTI Batch mode?

PritiShaw commented 4 years ago
Hi Mentor, Please find the table of time taken for the Batch processing. Number of abstract in a input Text file Run 1 Run 2 Run 3 Run 4 Run 5 Average Time taken Time taken per abstract Abstract processed in 24hrs
1 40 45 40 46 45 43.2 43.2 2,000
3 43 45 46 44 40 43.6 14.53 5,946
5 39 41 41 44 45 42 8.4 10,285
10 45 45 42 40 43 43 4.3 20,093
50 40 41 40 40 42 40.6 0.81 106,666
100 41 41 43 41 40 41.2 0.41 210,731
200 72 73 72 72 73 72.4 0.36 240,000
500 132 139 144 136 135 137.2 0.27 320,000
1000 232 243 228 229 235 233.4 0.23 375,652
cannin commented 4 years ago

@PritiShaw just to double-check. what is the unit of time? seconds correct?

PritiShaw commented 4 years ago

@PritiShaw just to double-check. what is the unit of time? seconds correct? Yes, it is in seconds(s).