VTUL / vtechworks

DSpace at Virginia Tech
http://vtechworks.lib.vt.edu
Other
6 stars 8 forks source link

Improve SWORD crosswalk #720

Closed alawvt closed 2 years ago

alawvt commented 4 years ago

It looks like BioMed Central is still using sword-mets "SWAP Metadata" including eprints terms (xmlns:epdcx="http://purl.org/eprint/epdcx/2006-11-16/"). MDPI and Hindawi also use this packaging. A typical header is,

<mets ID="sort-mets_mets" OBJID="sword-mets" LABEL="DSpace SWORD Item" PROFILE="DSpace METS SIP Profile 1.0" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:epdcx="http://purl.org/eprint/epdcx/2006-11-16/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/METS/">

The DSpace crosswalk for this is defined in dspace/dspace/config/crosswalks/sword-swap-ingest.xsl. This crosswalk is not mapping

These could also be mapped:

It could also be corrected to map to dc.identifier.doi instead of dc.identifier.uri, although this might map non-DOIs. Investigate other SWORD collections to see what mappings they use and if their mappings can be improved.

SWAP Profle, formerly Eprints profile

alawvt commented 4 years ago

@pyc1, it seems like we could improve SWORD metadata mapping with just this one crosswalk. If there are metadata fields that we want to add for all submissions, we might be able add them, as well. You could list the changes you have been making to SWORD submissions.

pyc1 commented 4 years ago

Fields we could add to all SWORD submissions:

dc.format.mimetype[en] application/pdf dc.rights[en] Creative Commons Attribution 4.0 International dc.rights.uri[en] http://creativecommons.org/licenses/by/4.0/ dc.language.iso[en] en dc.type[en] Article - Refereed dc.type.dcmitype[en] Text

BioMed Central/SpringerOpen notes:

-middle initials are missing the period -en language code comes in as dc.language.rfc3066 -DOI comes in as dc.identifier.uri

MDPI notes:

-middle initials have 2 spaces before -DOI comes in as dc.identifier

Hindawi notes:

-DOI comes in as dc.identifier; also add https and delete "dx."

alawvt commented 3 years ago

map DOI correctly from both dc.identifier.url and dc.identifier.doi.

pyc1 commented 3 years ago

I think you mean dc.identifier.uri and dc.identifier. I'm not sure that we get any that go correctly into dc.identifier.doi.

alawvt commented 3 years ago

One publisher is also sending us <epdcx:statement epdcx:propertyURI="http://purl.org/dc/terms/idendifier/doi"> epdcx:valueString10.3390/robotics10040109</epdcx:valueString> </epdcx:statement>.

alawvt commented 2 years ago

Tom Gibons from ACM notes that only some of their materials have Creative Commons licenses, so we might not be able to add that for all.