Open shyousefi opened 2 days ago
Whether abstracts appear on the website or not depends on the metadata the workshop organizers supplied us; we don’t scrape PDFs, for example, to get the abstracts. The inconsistency between the volume page and the individual paper pages is something that ideally shouldn’t happen, though.
However, I would really not recommend scraping the web pages at all — you can extractly all information directly from our XML files or access them through our Python library.
Thank you very much for your reply.
On Fri, Nov 15, 2024 at 4:13 PM Marcel Bollmann @.***> wrote:
Whether abstracts appear on the website or not depends on the metadata the workshop organizers supplied us; we don’t scrape PDFs, for example, to get the abstracts. The inconsistency between the volume page and the individual paper pages is something that ideally shouldn’t happen, though.
However, I would really not recommend scraping the web pages at all — you can extractly all information directly from our XML files https://github.com/acl-org/acl-anthology/tree/master/data/xml or access them through our Python library https://acl-anthology-py.readthedocs.io/en/stable/.
— Reply to this email directly, view it on GitHub https://github.com/acl-org/acl-anthology/issues/4057#issuecomment-2478731744, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2IR6TNWQ25SCPE45SFFSD32AXT7PAVCNFSM6AAAAABRZ26OJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZYG4ZTCNZUGQ . You are receiving this because you authored the thread.Message ID: @.***>
--
Shahin Yousefi (Ms.)
Research Assistant (NLP)
Faculty of Computer Science
Institute for Advanced Studies in Basic Sciences (IASBS)
Zanjan 45137-66731
Iran
T: (+98) 914 561-5536 (cell)
E: @.***
Confirm that this is a metadata correction
Anthology ID
2024.signlang-1.3
Type of Paper Metadata Correction
Correction to Paper Title
No response
Correction to Paper Abstract
The site format varies across articles in this particular link (along with others). For some articles, accessing the abstract is not possible through this page. When scraping the page, the abstract line remains empty. To obtain the abstract, you must navigate to the individual article link, as the abstract is unavailable on this page. In such cases, extracting the PDF becomes necessary.
Correction to Author Name(s)
No response