WDscholia / scholia

Wikidata-based scholarly profiles
https://scholia.toolforge.org
Other
215 stars 78 forks source link

Fix empty title bug in CEURWS scraper #2397

Closed faresh9 closed 5 months ago

faresh9 commented 6 months ago

Pull Request: Fixes #2395

Description

This pull request addresses the issue #2395, where the CEURWS scraper fails on an empty page name. The problem occurred when generating 'LAST P304 " "' for a specific URL (https://ceur-ws.org/Vol-3592/paper9.pdf). The expected behavior is that no page number should be generated in this case.

Caveats

If you make changes to the Python code

Testing

I tested the changes using the following steps:

  1. Ran the command python3 -m scholia.scrape.ceurws proceedings-url-to-quickstatements https://ceur-ws.org/Vol-3592/paper9.pdf
  2. Verified that the scraper now correctly handles the case of an empty page name
  3. Checked the generated QuickStatements output to ensure no page number is incorrectly generated

Checklist

fnielsen commented 5 months ago

Could you rebase the branch and also fix the styling errors.

scholia/scrape/ceurws.py:133:1: W293 blank line contains whitespace
scholia/scrape/ceurws.py:135:80: E501 line too long (80 > 79 characters)
scholia/scrape/ceurws.py:177:1: E303 too many blank lines (3)
scholia/scrape/ceurws.py:385:1: E303 too many blank lines (3)

from flake8 scholia