Added new xpath to handle 'summary_title' class in ojs.py

faresh9 commented 8 months ago

Title

Fixes #2392 and #2366, better PR than #2418

Description

Updated the issue_url_to_paper_urls function to fix the issue where OJS scraper was not working for the provided URLs : (https://tidsskrift.dk/sygdomogsamfund/issue/view/10555) and (https://tidsskrift.dk/sygdomogsamfund/issue/view/3537) . The function now uses a more robust XPath expression to extract paper URLs from the issue webpage.

Caveats

No breaking changes introduced. No new dependencies added.

Testing

Tested the updated function with the provided URL for both issues (https://tidsskrift.dk/sygdomogsamfund/issue/view/10555) and (https://tidsskrift.dk/sygdomogsamfund/issue/view/3537) and verified that papers URLs are successfully extracted.

Checklist

[x] I have commented my code, particularly in hard-to-understand areas
[x] My changes generate no new warnings
[x] I have not used code from external sources without attribution
[x] I have considered accessibility in my implementation
[x] There are no remaining debug statements (print, console.log, ...)

fnielsen commented 8 months ago

There are two small styling issues:

scholia/scrape/ojs.py:96:80: E501 line too long (86 > 79 characters) scholia/scrape/ojs.py:105:1: E303 too many blank lines (3)

fnielsen commented 8 months ago

Thanks!

WDscholia / scholia