adsabs / ADSIngestParser

Curation parser library
MIT License
0 stars 7 forks source link

All parsers including JATS should populate the fulltext item depending on whether the publisher supplied file has a "<body>" tag or equivalent. #79

Open seasidesparrow opened 1 year ago

seasidesparrow commented 1 year ago

Describe the bug Publisher-supplied metadata may or may not contain the body of the fulltext included. For cases where we do receive the fulltext from the publisher, we need to populate the fulltext element of the Document. fulltext is a dict having the keys language and body. In the case of (for example) JATS content, the fulltext will be enclosed within the <body></body> tag. We need to extract this and write it into the fulltext.body.

To Reproduce Steps to reproduce the behavior:

Additional context Add any other context about the problem here.