adsabs / AdsDataSqlSync

For non-bibliographic ADS data
GNU General Public License v3.0
1 stars 6 forks source link

difference in data between old/new pipeline #60

Closed romanchyla closed 6 years ago

romanchyla commented 6 years ago

for 'data' field, old solr index has 'CDS', the new has 'SIMBAD'

this is what i see in the nonbib table; @golnazads or @aaccomazzi can you tell if it is ok, or we might be missing something?

data_pipeline=> select * from nonbib.datalinks where bibcode = '2015ApJ...815L...1T';
       bibcode       | link_type | link_sub_type |                          url                           |         title          | item_count 
---------------------+-----------+---------------+--------------------------------------------------------+------------------------+------------
 2015ApJ...815L...1T | ARTICLE   | EPRINT_HTML   | {http://arxiv.org/abs/1511.04460}                      | {}                     |          0
 2015ApJ...815L...1T | ARTICLE   | EPRINT_PDF    | {http://arxiv.org/pdf/1511.04460}                      | {}                     |          0
 2015ApJ...815L...1T | ARTICLE   | PUB_HTML      | {http://dx.doi.org/10.1088%2F2041-8205%2F815%2F1%2FL1} | {}                     |          0
 2015ApJ...815L...1T | ARTICLE   | PUB_PDF       | {http://stacks.iop.org/2041-8205/815/L1/pdf}           | {}                     |          0
 2015ApJ...815L...1T | DATA      | SIMBAD        | {http://$SIMBAD$/simbo.pl?bibcode=2015ApJ...815L...1T} | {"SIMBAD Objects (1)"} |          1
(5 rows)
golnazads commented 6 years ago

The new data also has CDS, here is the list of all data sub types

data sub types

data = [ 'ARI', 'SIMBAD', 'NED', 'CDS', 'Vizier', 'GCPD', 'Author', 'PDG', 'MAST', 'HEASARC', 'INES', 'IBVS', 'Astroverse', 'ESA', 'NExScI', 'PDS', 'AcA', 'ISO', 'ESO', 'CXO', 'NOAO', 'XMM', 'Spitzer', 'PASA', 'ATNF', 'KOA', 'Herschel', 'GTC', 'BICEP2', 'ALMA', 'CADC', 'Zenodo', 'TNS', '' ]

golnaz

On Wed, Oct 18, 2017 at 8:17 PM, Roman Chyla notifications@github.com wrote:

Assigned #60 https://github.com/adsabs/AdsDataSqlSync/issues/60 to @golnazads https://github.com/golnazads.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/adsabs/AdsDataSqlSync/issues/60#event-1300163721, or mute the thread https://github.com/notifications/unsubscribe-auth/AbbOCBeiSZFqIrUE5v-YQ5b-tyHdoW6Qks5stpUHgaJpZM4P-iJq .

romanchyla commented 6 years ago

i'm not sure where that data comes from @golnazads - Steve ran nonbib pipeline yesterday and there were no changes in the code in the meantime; so the record in question - should it have SIMBAD?

golnazads commented 6 years ago

Yes, it think so. that is what I also get in my local db that I created while back. I think it should be SIMBAD.

data_pipeline=# select * from nonbib.datalinks where bibcode='2015ApJ...815L...1T';

   bibcode       | link_type | link_sub_type |

url | title | item_count

---------------------+-----------+---------------+--------------------------------------------------------+------------------------+------------

2015ApJ...815L...1T | ARTICLE | EPRINT_HTML | { http://arxiv.org/abs/1511.04460} | {} | 0

2015ApJ...815L...1T | ARTICLE | EPRINT_PDF | { http://arxiv.org/pdf/1511.04460} | {} | 0

2015ApJ...815L...1T | ARTICLE | PUB_HTML | { http://dx.doi.org/10.1088%2F2041-8205%2F815%2F1%2FL1} | {} | 0

2015ApJ...815L...1T | ARTICLE | PUB_PDF | { http://stacks.iop.org/2041-8205/815/L1/pdf} | {} | 0

2015ApJ...815L...1T | DATA | SIMBAD | {http://$SIMBAD$/ simbo.pl?bibcode=2015ApJ...815L...1T} | {"SIMBAD Objects (1)"} | 1

(5 rows)

you think the last record used to be CDS in nonbib?

just checked solr for links_data and here is what is there

"links_data":["{\"title\":\"\", \"type\":\"simbad\", \"instances\":\"1\", \"access\":\"\"}", "{\"title\":\"\", \"type\":\"pdf\", \"instances\":\"\", \"access\":\"open\"}", "{\"title\":\"\", \"type\":\"preprint\", \"instances\":\"\", \"access\":\"open\"}", "{\"title\":\"\", \"type\":\"electr\", \"instances\":\"\", \"access\":\"open\"}"],

On Wed, Oct 18, 2017 at 8:50 PM, Roman Chyla notifications@github.com wrote:

i'm not sure where that data comes from @golnazads https://github.com/golnazads - Steve ran nonbib pipeline yesterday and there were no changes in the code in the meantime; so the record in question - should it have SIMBAD?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/adsabs/AdsDataSqlSync/issues/60#issuecomment-337768026, or mute the thread https://github.com/notifications/unsubscribe-auth/AbbOCJ7pFamE5Qz7MKZUMLf3LLEFTb0Fks5stpy6gaJpZM4P-iJq .

aaccomazzi commented 6 years ago

I checked the source file, SIMBAD is correct. We made changes in the way we structure links over the past month so now we consistently create the same labels for the "data" solr field and the datalinks table. So some changes in "data" are to be expected.