WDscholia / scholia

Wikidata-based scholarly profiles
https://scholia.toolforge.org
Other
215 stars 78 forks source link

CEURWS scraper raise and error in models for paper_to_q when scraping the proceedings #2386

Closed fnielsen closed 7 months ago

fnielsen commented 7 months ago

Describe the bug CEURWS scraper raise and error in models for paper_to_q when scraping the proceedings

To Reproduce Steps to reproduce the behavior:

  1. python -m scholia.scrape.ceurws proceedings-url-to-quickstatements https://ceur-ws.org/Vol-3559/
Traceback (most recent call last):
  File "../python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "../python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "../scholia/scrape/ceurws.py", line 404, in <module>
    main()
  File "../scholia/scrape/ceurws.py", line 396, in main
    qs = proceedings_url_to_quickstatements(url, iso639=iso639)
  File "../scholia/scrape/ceurws.py", line 193, in proceedings_url_to_quickstatements
    q = paper_to_q(proceedings)
  File "../scholia/scrape/ceurws.py", line 270, in paper_to_q
    data = response.json()['results']['bindings']
  File "../python3.10/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Expected behavior Quickstatements should be generated or an appropriate error reported.

WolfgangFahl commented 7 months ago

see also the API of http://ceurspt.wikidata.dbis.rwth-aachen.de/docs and http://ceurspt.wikidata.dbis.rwth-aachen.de/Vol-3559/paper-1.qs

# created by /home/wf/.local/lib/python3.10/site-packages/ceurspt/ceurws.py
CREATE
# P31  :instance of  Q13442814:scholarly article
LAST|P31|Q13442814
# P1433: published in 
LAST|P1433|None
# english label
LAST|Len|"Location Query Answering Using Box Embeddings"
# english description
LAST|Den|"scientific paper published in CEUR-WS Volume 3559"
# P1476:title
LAST|P1476|en:"Location Query Answering Using Box Embeddings"
# P407 :language of work or name  Q1860:English
LAST|P407|Q1860
# P953 :full work available at URL
LAST|P953|"https://ceur-ws.org/Vol-3559/paper-1.pdf"
# P577 :publication date
LAST|P577|+2023-11-21T00:00:00Z/11
# P2093: author name string, P1545: series ordinal
LAST|P2093|"Eleni Tsalapati"|P1545|"1"
# P2093: author name string, P1545: series ordinal
LAST|P2093|"Markos Iliakis"|P1545|"2"
# P2093: author name string, P1545: series ordinal
LAST|P2093|"Manolis Koubarakis"|P1545|"3"

for how we hope to add papers - indeed i'd prefer to use the wikibase-cli json approach: http://ceurspt.wikidata.dbis.rwth-aachen.de/Vol-3559/paper-1.wbjson

{
  "labels": {
    "en": "Location Query Answering Using Box Embeddings"
  },
  "descriptions": {
    "en": "scientific paper published in CEUR-WS Volume 3559"
  },
  "claims": {
    "P31": "Q13442814",
    "P1433": null,
    "P1476": {
      "text": "Location Query Answering Using Box Embeddings",
      "language": "en"
    },
    "P407": "Q1860",
    "P953": "https://ceur-ws.org/Vol-3559/paper-1.pdf",
    "P50": [],
    "P2093": [{
      "value": "Eleni Tsalapati",
      "qualifiers": {
        "P1545": "1"
      }
    }, {
      "value": "Markos Iliakis",
      "qualifiers": {
        "P1545": "2"
      }
    }, {
      "value": "Manolis Koubarakis",
      "qualifiers": {
        "P1545": "3"
      }
    }]
  }
}