Edirom / WeGA-ODD

ODD files for documenting the Digital Edition of the Carl-Maria-von-Weber-Gesamtausgabe
Other
11 stars 7 forks source link

remove duplicates from export formats #67

Open peterstadler opened 3 days ago

peterstadler commented 3 days ago

duplicates are propagated to the export formats (thus being published to Zenodo) without proper ID at the root element. Either this should be fixed or even better those duplicates should be removed from the export completely.

riedde commented 3 days ago

Could you give a more concrete hint or example?

peterstadler commented 3 days ago

Sure, here is an example: The letter A040001 is a duplicate, i.e. this "letter" got moved to the documents section with the ID A100245 at some time. The file with the old ID is still kept in our subversion but it redirects to the new location via <ref type="duplicate" target="A100245"/>.

The transformation to tei_all (and all other export formats based on this) are now converting the duplicates to "proper" TEI files, yet without any @xml:id on the <TEI> element, see https://github.com/Edirom/WeGA-ODD/blob/51a662c4dbe65e9e0fd14f0594c48d9ec6e51c00/xsl/to-tei_all.xsl#L240-L277

This struck me during the eXist workshop at the Edirom Summer School I was giving with @martinascholger where we were using the WeGA-data package from Zenodo.

Right now I'm thinking of removing those duplicates from the export – or can you think of any benefit of keeping them?