PathVisio / pathvisio

PathVisio - pathway editor, visualization and analysis software
http://www.pathvisio.org
Apache License 2.0
21 stars 21 forks source link

Citations - duplicate fields #191

Closed AlexanderPico closed 7 months ago

AlexanderPico commented 1 year ago

In a pathway edited in April 2022, each of the publicationXrefs contained duplicated fields (ID, DB, TITLE, SOURCE and YEAR) two more more times with a single element. Only the AUTHOR fields are unique. For example:

11278778 PubMed The yeast ALG11 gene specifies addition of the terminal alpha 1,2-Man to the Man5GlcNAc2-PP-dolichol N-glycosylation intermediate formed on the cytosolic side of the endoplasmic reticulum. J Biol Chem 2001 11278778 PubMed The yeast ALG11 gene specifies addition of the terminal alpha 1,2-Man to the Man5GlcNAc2-PP-dolichol N-glycosylation intermediate formed on the cytosolic side of the endoplasmic reticulum. J Biol Chem 2001 11278778 PubMed The yeast ALG11 gene specifies addition of the terminal alpha 1,2-Man to the Man5GlcNAc2-PP-dolichol N-glycosylation intermediate formed on the cytosolic side of the endoplasmic reticulum. J Biol Chem 2001 Cipollo JF Trimble RB Chi JH Yan Q Dean N

This doesn't appear to break our current processing pipeline for md and tsv files, but it's probably something we should fix.

larsgw commented 10 months ago

Based on the git history this might have been the case for 10 years but when these properties are set they're technically just added, and the maxCardinality makes sure the old element is removed. Except, they're only removed from the internal representation of the data (line 65), not the org.jdom2.Element (line 70):

https://github.com/PathVisio/pathvisio/blob/33d315b4e0fab46768398773f28b87ad4a829187/modules/org.pathvisio.core/src/org/pathvisio/core/biopax/BiopaxNode.java#L61-L70

Fix should be as easy as calling removeProperty() instead of properties.remove() as it updates wrapped. I'm including a patch in an upcoming PR.

larsgw commented 10 months ago

(but to fix existing GPML, it also needs to delete all occurrences, not just the first.)