adsabs / ADSImportPipeline

Data ingest pipeline for ADS classic->ADS+
GNU General Public License v3.0
1 stars 12 forks source link

page_count has long? #175

Closed romanchyla closed 6 years ago

romanchyla commented 6 years ago
     args=[{u'modtime': u'2017-10-27T03:25:49.157987Z', u'bibcode': u'2004PSSCR...1.1316K', u'text': {u'acknowledgement': []}, u'JSON_fingerprint': u'{"abs":[{"p":"/proj/ads/abstracts/phy/text/J10/J10-59816.abs","primary":1,"t":"1286568870"}],"links":{"electr":[{"u":"http://www3.interscience.wiley.com/cgi-bin/abstract/107637797/ABSTRACT"},{"u":"http://www3.interscience.wiley.com/cgi-bin/abstract/107642629/ABSTRACT"}],"pdf":[{"u":"http://www3.interscience.wiley.com/cgi-bin/fulltext/107637797/PDFSTART"},{"u":"http://www3.interscience.wiley.com/cgi-bin/fulltext/107642629/PDFSTART"}]},"prop":["refereed"]}', u'entry_date': u'2006-10-10', u'metadata': [{u'comment': [], u'doi': [{u'origin': u'WEB', u'content': u'10.1002/pssc.200304318'}], u'publication': {u'origin': u'WEB', u'dates': [{u'content': u'2004-04-00', u'type': u'date-published'}, {u'content': u'2004', u'type': u'publication_year'}], u'name': {u'raw': u'Physica Status Solidi (C), Applied Research, vol. 1, Issue 6, pp.32004-50102519628102519628100244416107642628484891600033416132515592004444331325', u'canonical': u'Physica Status Solidi C Current Topics'}, u'page_count': u'50102519628102519628100244416107642628484891600033416132515592004444299322', u'page_last': u'50102519628102519628100244416107642628484891600033416132515592004444331325', u'volume': u'1', u'page_range': u'32004-50102519628102519628100244416107642628484891600033416132515592004444331325', u'electronic_id': None, u'altbibcode': u'2004PSSCR...1.1316K', u'issue': u'6', u'page': u'32004'}, u'language': u'', u'tempdata': {u'origin': u'WEB', u'modtime': u'2010-10-08T20:14:30Z', u'type': u'general', u'primary': True, u'alternate_journal': False}, u'issns': [], u'conf_metadata': {u'origin': u'WEB', u'content': None}, u'titles': [{u'lang': u'en', u'text': u'Special Issue: International Conference on Physics of Light-Matter Coupling in Nanostructures III'}], u'isbns': [], u'authors': [{u'name': {u'western': u'Kavokin, Alexey', u'normalized': u'Kavokin, A', u'native': None}, u'number': u'1', u'affiliations': [], u'orcid': None, u'type': u'regular', u'emails': []}, {u'name': {u'western': u'Laussy, Fabrice P.', u'normalized': u'Laussy, F', u'native': None}, u'number': u'2', u'affiliations': [], u'orcid': None, u'type': u'regular', u'emails': []}], u'keywords': [{u'origin': u'WEB', u'type': u'', u'original': u'81.05.Gc', u'channel': u'', u'normalized': None}, {u'origin': u'WEB', u'type': u'', u'original': u'85.30.Pq', u'channel': u'', u'normalized': None}, {u'origin': u'WEB', u'type': u'', u'original': u'85.40.Hp', u'channel': u'', u'normalized': None}, {u'origin': u'WEB', u'type': u'', u'original': u'87.59.Hp%R 2004PSSCR...132004K', u'channel': u'', u'normalized': None}], u'arxivcategories': [], u'pubnote': [], u'copyright': [], u'abstracts': [{u'lang': u'en', u'text': u'The 3rd International Conference on Physics of Light-Matter Coupling in Nanostructures (PLMCN3) took place in Acireale, Sicily, Italy from 1-4 October 2003. The aim of this conference was to review the fundamental background for realization of a new generation of opto-electronic devices such as polariton lasers, new optical switches and emitters based on microcavities. The idea was to combine the experience of spectroscopists and theorists with that of specialists in crystals growth of wide-band semiconductors (GaN, CdTe, ZnSe, ZnO) and organic materials.', u'origin': u'WEB'}]}, {u'refereed': True, u'openaccess': False, u'eprint_openaccess': False, u'data_sources': [], u'pub_openaccess': False, u'tempdata': {u'origin': u'ADS metadata', u'modtime': None, u'type': u'properties', u'primary': False, u'alternate_journal': False}, u'doctype': {u'origin': u'ADS metadata', u'content': u'article'}, u'private': False, u'ocrabstract': False, u'associates': [], u'ads_openaccess': False, u'databases': [{u'origin': u'ADS metadata', u'content': u'PHY'}], u'vizier_tables': [], u'bibgroups': []}, {u'tempdata': {u'modtime': None, u'origin': u'ADS metadata', u'type': u'relations', u'primary': False, u'alternate_journal': False}, u'links': [{u'origin': None, u'count': None, u'title': None, u'url': u'http://www3.interscience.wiley.com/cgi-bin/fulltext/107637797/PDFSTART', u'access': None, u'type': u'pdf'}, {u'origin': None, u'count': None, u'title': None, u'url': u'http://www3.interscience.wiley.com/cgi-bin/fulltext/107642629/PDFSTART', u'access': None, u'type': u'pdf'}, {u'origin': None, u'count': None, u'title': None, u'url': u'http://www3.interscience.wiley.com/cgi-bin/abstract/107637797/ABSTRACT', u'access': None, u'type': u'electr'}, {u'origin': None, u'count': None, u'title': None, u'url': u'http://www3.interscience.wiley.com/cgi-bin/abstract/107642629/ABSTRACT', u'access': None, u'type': u'electr'}], u'preprints': [], u'alternates': [{u'origin': None, u'content': u'2004PSSCR...1.1316S', u'type': u'deleted'}]}]}]
     kwargs={}
     trace=Traceback (most recent call last):
       File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 374, in trace_task
         R = retval = fun(*args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 629, in __protected_call__
         return self.run(*args, **kwargs)
       File "/app/aip/tasks.py", line 93, in task_merge_metadata
         solr_adapter.SolrAdapter.validate(r)  # Raises AssertionError if not validated
       File "/app/aip/libs/solr_adapter.py", line 570, in validate
         assert isinstance(v, type(SCHEMA[k])), '{0}: has an unexpected type ({1}!={2}): {3}'.format(k, type(v), SCHEMA[k], v)
     AssertionError: page_count: has an unexpected type (<type 'long'>!=0): 50102519628102519628100244416107642628484891600033416132515592004444299322
romanchyla commented 6 years ago

@golnazads can you investigate please? This error appears only once, but it is coming from the nonbib pipeline and it is very curious issue.

golnazads commented 6 years ago

I looked at the record in Classic and I am guessing that is what is coming in to the parser

http://adsabs.harvard.edu/abs/2004PSSCR...1.1316K

Look at the publication

Physica Status Solidi (C), Applied Research, vol. 1, Issue 6, pp.32004-50102519628102519628100244416107642628484891600033416132515592004444331325

I think we need to fix the record. But I could build some more intelligent into it. Right now if both page and last page are numeric the page count is computed. in this case both are numeric.

On Fri, Oct 27, 2017 at 6:39 PM, Roman Chyla notifications@github.com wrote:

@golnazads https://github.com/golnazads can you investigate please? This error appears only once, but it is coming from the nonbib pipeline and it is very curious issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/adsabs/ADSimportpipeline/issues/175#issuecomment-340113710, or mute the thread https://github.com/notifications/unsubscribe-auth/AbbOCOizkGN2vGebZoBd5OAbLjL0Seauks5swluSgaJpZM4QJrLI .

csgrant00 commented 6 years ago

On 10/27/17 7:25 PM, golnazads wrote:

I looked at the record in Classic and I am guessing that is what is coming in to the parser

http://adsabs.harvard.edu/abs/2004PSSCR...1.1316K

Look at the publication

Physica Status Solidi (C), Applied Research, vol. 1, Issue 6, pp.32004-50102519628102519628100244416107642628484891600033416132515592004444331325 1.5 * pi?

I'll fix.

--


    Carolyn Stern Grant              Astrophysics Data System (ADS)
    cgrant@cfa.harvard.edu           Center for Astrophysics
    617-495-7154 (voicemail)         60 Garden Street  MS 83
    617-495-7356 fax                 Cambridge, MA  02138