adsabs / ADSImportPipeline

Data ingest pipeline for ADS classic->ADS+
GNU General Public License v3.0
1 stars 12 forks source link

Extend author type support #230

Open spacemansteve opened 5 years ago

spacemansteve commented 5 years ago

Edwin noticed some bibcodes are missing author information (e.g., 1988Sci...240..668D, 1988Sci...240..668M and 1986Sci...232..778D). Alberto suggested it could be related to these being book reviews.
For the author field, when the ADSRecord doesn't specify a type it defaults to 'regular'. 1988Sci...240..668D lists two authors, one of type book and and one of type review. In import pipeline, solr_adapter.py only uses a list of valid author types and that is set to AUTHOR_TYPES = set(['regular', 'collaboration']) . It needs to expand by at least adding review. The _book_author getter function probably needs to look for authors of type book.

aaccomazzi commented 5 years ago

Makes sense. Just one additional piece of information: we should add the type of 'review' to the AUTHOR_TYPES array so that review authors end up populating the author field. And yes, the book_author field will need to pull authors of type book

golnazads commented 5 years ago

List of author types from adspy: https://github.com/adsabs/adspy/blob/master/ADSCachedExports.py#L856

aaccomazzi commented 5 years ago

Right. So the mapping from author types to solr fields is as follows:

regular  -> author
review   -> author
book     -> book_author
editor   -> editor
golnazads commented 5 years ago

As per discussion with @aaccomazzi modifying this line https://github.com/adsabs/adspy/blob/master/ADSCachedExports.py#L936 to return book_author to the pipeline to match the field in solr. Just realized on the slide of ADImportPipeline just need to add review to the author type list https://github.com/adsabs/ADSImportPipeline/blob/master/aip/classic/solr_adapter.py#L15

golnazads commented 5 years ago

Steve, would you please let me know when I should check the about 3 bibcodes, just to be sure that we got them right. thank you so much.

golnazads commented 5 years ago

Steve the three bibcodes you run made it to solr and everything seems fine.

[
      {
        "first_author":"Miller, Richard H.",
        "bibcode":"1988Sci...240..668D",
        "first_author_norm":"Miller, R",
        "author":["Miller, Richard H."],
        "book_author":["De Zeeuw, Tim"],
        "author_norm":["Miller, R"],
        "indexstamp":"2019-06-04T02:03:09.391Z"},
      {
        "first_author":"Miller, Richard H.",
        "bibcode":"1988Sci...240..668M",
        "first_author_norm":"Miller, R",
        "author":["Miller, Richard H."],
        "book_author":["De Zeeuw, Tim"],
        "author_norm":["Miller, R"],
        "indexstamp":"2019-06-04T02:03:09.391Z"},
      {
        "first_author":"De Zeeuw, Tim",
        "bibcode":"1986Sci...232..778D",
        "first_author_norm":"De Zeeuw, T",
        "author":["De Zeeuw, Tim"],
        "book_author":["Saslaw, William C."],
        "author_norm":["De Zeeuw, T"],
        "indexstamp":"2019-06-04T02:03:09.391Z"}]