Closed dgarijo closed 8 months ago
I wonder how to represent this information. On the one hand, we have the text we detect as citations. On the other hand, we would generate a structured representation of the citation. Maybe this is relevant for the RDF generation only? Maybe we still include a structured object in the output? (but then we would need to state where it comes from)
After giving this a thought, I think that SOMEF should create structured output with the things that have been recognized. Text only is not very useful for machine actionability, which is the final purpose of the extraction.
Hence, SOMEF will return in citation a number of Publication
objects. Each object will have everything that could be extracted about that publication. If a bibtex and a CFF file are provided recognizing the same object, only one will be returned
Here the arxivLinks
field should be merged to provide additional information
Also, link to OpenAlex
Now somef returns title, authors, doi and URL. For now I will close this issue. Missing: mapping this to a more complex citation object.
Bib format is a highly structured representation which is easy to transform to JSON/RDF.
We should leverage bib to create better representations. Right now BIB is taken as is.