Closed anusharanganathan closed 8 years ago
the fulltext HTML file is just a list of the fulltext HTML urls that were available. I'm moving it to an --html
option so that users can request the HTML to be downloaded, and there will no longer be a fulltex_html_urls.txt
file
done in 0.4.1
What is the purpose of the file _fulltext_htmlurls.txt available as a part of the output?
Purpose: Search open access papers in eupmc for the query dinosaurs and download fulltext XMLs, supplementary files and fulltext PDFs if available
Query used
This generated a _fulltext_htmlurls.txt file with 22 urls
Not all pmids listed in _fulltext_htmlurls.txt had a corresponding fulltext.xml or fulltext.html file downloaded. Of the 22 urls with pmcids listed in the file, the breakdown of what I found was as follows:
warn: Article with pmcid "PMC3381548" had no fulltext PDF url