Closed faresh9 closed 5 months ago
Could you rebase the branch and also fix the styling errors.
scholia/scrape/ceurws.py:133:1: W293 blank line contains whitespace
scholia/scrape/ceurws.py:135:80: E501 line too long (80 > 79 characters)
scholia/scrape/ceurws.py:177:1: E303 too many blank lines (3)
scholia/scrape/ceurws.py:385:1: E303 too many blank lines (3)
from flake8 scholia
Pull Request: Fixes #2395
Description
This pull request addresses the issue #2395, where the CEURWS scraper fails on an empty page name. The problem occurred when generating 'LAST P304 " "' for a specific URL (https://ceur-ws.org/Vol-3592/paper9.pdf). The expected behavior is that no page number should be generated in this case.
Caveats
If you make changes to the Python code
Testing
I tested the changes using the following steps:
python3 -m scholia.scrape.ceurws proceedings-url-to-quickstatements https://ceur-ws.org/Vol-3592/paper9.pdf
Checklist