Christof93 / SciKGTeX

SciKGTeX is a LuaTeX package which introduces commands to mark research contributions in scientific documents. SciKGTeX will enrich the document by adding your contributions to PDF metadata in a structured XMP format which can be picked up by Search Engines and Knowledge Graphs.
MIT License
17 stars 0 forks source link

Duplicated authors in PDF metadata #18

Open okarras opened 7 months ago

okarras commented 7 months ago

It seems to be possible to annotate the authors (and probably the title) several times in a paper and every annotation is added to the PDF metadata. I found one paper that has a duplicated list of authors, leading to an issue when importing the data into the ORKG.

Any idea how to prevent this problem?

`<?xpacket begin="?" id="18a3c4f8-ac80-459a-c5a7-0c296941c6"?>

Unveiling Competition Dynamics in Mobile App Markets through User Reviews Quim Motger Xavier Franch Vincenzo Gervasi Jordi Marco Quim Motger Xavier Franch Vincenzo Gervasi Jordi Marco no other proposals exist for leveraging review-based metrics for cross-app and competition analysis within a given app market https://doi.org/10.5281/zenodo.10125307 https://github.com/quim-motger/app-market-analysis to proactively inform app stakeholders about changes in how users perceive a given app within a specific market segment, ultimately to infer the threats and opportunities stemming from these changes exploratory design resulting in insights about the correlation between review-based metrics from a mobile app with respect to potential competitor apps automatic, proactive identification of significant events within a specific market segment and aim to monitor user behaviour and detect feedback trends Evaluation results illustrate that the trigger event of the case study is actually reported by our approach multiple examples of minor events which might not have such a major impact (e.g., new competitors, positive/negative reactions to new releases from competitors) are also proactively detected Evaluation results illustrate several timely and actionable examples regarding market dynamics, including software-level key changes, contextual factors, and the entry of new competitors. `
jonassmedegaard commented 6 months ago

Literally the authors are indeed duplicated in the presented data, but semantically they are not: Semantically, the authores are simply declared twice, which should only be slightly wasteful of space within the PDF file, and should not cause trouble for proper parsers of the semantic data.

It would be a different matter if the authors were declared like this:

    dc:creator [
        rdf:_1 "Ian Valentin Christensen"@da ;
        rdf:_2 "Jonas Smedegaard"@da ;
        a rdf:Seq
    ] ;

When declared as an ordered list, it would be semantically problematic if authors were repeated - not that they would double in amount, but it would become illogical which sorting order they were supposed to be represented.

okarras commented 6 months ago

I further checked the original annotated paper with the authors.

The authors have not been annotated / declared twice, instead the command seems to be executed two times while compiling the PDF resulting in the duplicated authors.