This change will extract the [PDF] href value from the right hand side of a Google Scholar article entry. It will record the URL as url_pdf if the article's url_pdf hasn't already been filled and Google scholar labels the link as a PDF (i.e. the element's text is [PDF]).
Pre-change: 0/4 PDF links extracted
Post-change: 4/4 PDF links extracted
As far as I am aware Google Scholar's [PDF] label is the best, easily available indicator of whether the (optional) right hand side anchor refers to a PDF file.
This change will extract the
[PDF]
href value from the right hand side of a Google Scholar article entry. It will record the URL asurl_pdf
if the article'surl_pdf
hasn't already been filled and Google scholar labels the link as a PDF (i.e. the element's text is[PDF]
).Test:
scholar.py -c 10 --txt --author "einstein" --phrase "quantum"
Pre-change: 0/4 PDF links extracted Post-change: 4/4 PDF links extracted
As far as I am aware Google Scholar's
[PDF]
label is the best, easily available indicator of whether the (optional) right hand side anchor refers to a PDF file.