ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

What is the purpose of the file fulltext_html_urls.txt #53

Closed anusharanganathan closed 8 years ago

anusharanganathan commented 9 years ago

What is the purpose of the file _fulltext_htmlurls.txt available as a part of the output?

Purpose: Search open access papers in eupmc for the query dinosaurs and download fulltext XMLs, supplementary files and fulltext PDFs if available

Query used

$ getpapers -q 'dinosaurs' -x -s -p -o dinosaursOutput2 >> dinosaursOutput2.log

This generated a _fulltext_htmlurls.txt file with 22 urls

Not all pmids listed in _fulltext_htmlurls.txt had a corresponding fulltext.xml or fulltext.html file downloaded. Of the 22 urls with pmcids listed in the file, the breakdown of what I found was as follows:

blahah commented 8 years ago

the fulltext HTML file is just a list of the fulltext HTML urls that were available. I'm moving it to an --html option so that users can request the HTML to be downloaded, and there will no longer be a fulltex_html_urls.txt file

blahah commented 8 years ago

done in 0.4.1