blekhmanlab / rxivist

API providing access to papers and authors scraped from biorxiv.org
https://rxivist.org
GNU Affero General Public License v3.0
60 stars 11 forks source link

Bug with recording detailed authors that are duplicated in list #168

Closed rabdill closed 6 years ago

rabdill commented 6 years ago

stack trace:

Refreshing article 20253
Determining publication status for DOI 10.1101/330126.
No data found
Determining posting date.
No older version detected; using date from current page: 2018-05-24
Determined 'posted on' date: 2018-05-24
Recorded 5 stats for ID 20253
Author has ORCiD; determining whether they exist in DB.
Recorded detailed author Lucas  Gruimarães Almeida with ID 9064
Recording email lucasalmeida.juaz@gmail.com for author
Author has ORCiD; determining whether they exist in DB.
Recorded detailed author Eduardo  Seiji Numata Filho with ID 9065
Recording email dunumata07@gmail.com for author
Recorded detailed author Geovani A dos Santos  Alves dos Santos with ID 9066
Recording email geovani.ufrb@gmail.com for author
Recorded detailed author José Tadeu  Carneiro Cardoso with ID 9067
Recording email mestrecamisa@globo.com for author
Author has ORCiD; determining whether they exist in DB.
*ORCiD: Author Sergio Moreira exists with ID 9064
Recording email serginhocapo@gmail.com for author
Error associating detailed authors to paper: duplicate key value violates unique constraint "article_detailed_authors_article_author_key"
DETAIL:  Key (article, author)=(20253, 9064) already exists.

Recording article associations one at a time.
Traceback (most recent call last):
  File "spider.py", line 427, in _record_detailed_authors
    except Exception as e:
psycopg2.IntegrityError: duplicate key value violates unique constraint "article_detailed_authors_article_author_key"
DETAIL:  Key (article, author)=(20253, 9064) already exists.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "spider.py", line 436, in _record_detailed_authors
    cursor.execute("INSERT INTO article_detailed_authors (article, author) VALUES (%s, %s);", (article_id, x))
NameError: name 'connection' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "spider.py", line 905, in <module>
    elif sys.argv[1] == "rankings":
  File "spider.py", line 820, in full_run
    else:
  File "spider.py", line 240, in refresh_article_stats
    updated += 1
  File "spider.py", line 439, in _record_detailed_authors
    pass
AttributeError: 'Spider' object has no attribute 'id'
rabdill commented 6 years ago

The secondary and tertiary exceptions are fixed by https://github.com/rabdill/rxivist/commit/c8c76d08e60a588ff93b08425b52547c226f630f

rabdill commented 6 years ago

This actually looks like it's all the way fixed