Closed ViktorWeissenborn closed 2 months ago
Hi Viktor,
Unfortunately, there is no easy way to implement an Abstract class comprehensively, because the structures of raw documents from each publisher differ a lot.
But for Wiley, Springer Nature, and Elsevier (maybe more), they have their own API for retrieving abstracts and metadata of their papers. So if you want the abstract, you can grab the DOIs using chemdataextractor and use the API of the publishers to retrieve the abstracts.
Dingyun
ah okay, makes sense. But lets say I only want to implement an abstract class for Elsevier documents, would this still be a problem?
greetings Viktor
For Elsevier, yes! Elsevier XMLs have a distinct xml decorator for abstracts/graphical abstracts. The files you'd want to change are reader.elsevier
and scrape.pub.elsevier
.
Hello (:
In ChemDataExtractor there are different document classes like the Title class, Heading class, Paragraph class and so on.
For me it would be very handy to also have an "Abstract" class that gives me the abstract of an article as easy as a Heading class gives me the heading and the Title class gives me the title of an article. Currently the Abstract of an article will be included in a Paragraph object and is therefore hard to identify as the abstract. It is often unclear if the extracted text of the paragraph objects under doc.elements is part of the abstract or part of a normal paragraph from another part of the document. Though for elsevier XML documents for example an abstract is clearly defined with its corresponding XML tags inside the XML document.
Would there be an "easy" or "quick" way to implement an Abstract class into ChemDataExtractor?
If so, let me know, I would be happy to take care of it myself, but I am not really sure where to start and how many dependent classes, functions and variables need to be changed...
kind regards Viktor