Closed aaccomazzi closed 6 years ago
Will we call all author-submitted pdfs AUTHOR_PDF even if they point to a journal website? Or should we try to separate them?
-Carolyn
On 9/22/17 5:26 PM, Alberto Accomazzi wrote:
We want to create a new field, called esources, in SOLR to capture all the different sources of full-text available for a paper. This will be an array of strings from the following set:
- PUB_PDF - paper has a link to publisher PDF fulltext
- PUB_HTML - paper has a link to publisher HTML fulltext
- ADS_PDF - paper has a link to ADS fulltext
- ADS_SCAN - paper has a link to ADS scan
- EPRINT_PDF - paper has a link to an eprint PDF (currently only arXiv, but possibly others in the future)
- EPRINT_HTML - paper has a link to an eprint HTML (currently only arXiv)
- AUTHOR_PDF - paper has a link to an author copy in PDF format
- AUTHOR_HTML - paper has a link to an author copy in HTML format
The corresponding links data should be created from the corresponding non-bib tables.
The solr field should be multivalued strings, indexed and stored.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/adsabs/ADSimportpipeline/issues/163, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQ4lZJIV3xWSQ4V3ArPoE5wHQ-X23T_ks5slCYDgaJpZM4PhQnu.
--
Carolyn Stern Grant Astrophysics Data System (ADS)
stern@cfa.harvard.edu Center for Astrophysics
617-495-7154 (voicemail) 60 Garden Street MS 83
617-495-7356 fax Cambridge, MA 02138
The intent is to have author_pdf point to author-hosted content, which may usually be the author copy of a paper.
I have created the proper directories under /proj/ads/abstracts/config/links to hold all of these tables, and in the process have tried to separate URLs to author-managed articles and pdfs from publisher supplied ones. The README files should explain what we're trying to accomplish.
added to solr, verified that data pipeline is sending the values inside 'esource' field - although can't see any real values now (because of the bug int he pipeline delivery)
We want to create a new field, called esources, in SOLR to capture all the different sources of full-text available for a paper. This will be an array of strings from the following set:
The corresponding links data should be created from the corresponding non-bib tables.
The solr field should be multivalued strings, indexed and stored.