KnowledgeCaptureAndDiscovery / somef

SOftware Metadata Extraction Framework: A tool for automatically extracting relevant software information from readme files
MIT License
44 stars 21 forks source link

Improve representation of citations #207

Closed dgarijo closed 8 months ago

dgarijo commented 3 years ago

Bib format is a highly structured representation which is easy to transform to JSON/RDF.

We should leverage bib to create better representations. Right now BIB is taken as is.

dgarijo commented 2 years ago

I wonder how to represent this information. On the one hand, we have the text we detect as citations. On the other hand, we would generate a structured representation of the citation. Maybe this is relevant for the RDF generation only? Maybe we still include a structured object in the output? (but then we would need to state where it comes from)

dgarijo commented 1 year ago

After giving this a thought, I think that SOMEF should create structured output with the things that have been recognized. Text only is not very useful for machine actionability, which is the final purpose of the extraction. Hence, SOMEF will return in citation a number of Publication objects. Each object will have everything that could be extracted about that publication. If a bibtex and a CFF file are provided recognizing the same object, only one will be returned

dgarijo commented 1 year ago

Here the arxivLinks field should be merged to provide additional information

dgarijo commented 1 year ago

Also, link to OpenAlex

dgarijo commented 8 months ago

Now somef returns title, authors, doi and URL. For now I will close this issue. Missing: mapping this to a more complex citation object.