Closed pyc1 closed 2 hours ago
Tom, It looks like what you received was the HTML for the collection page, https://vtechworks.lib.vt.edu/handle/10919/105038.
I am able to load art_4959256494595851995.zip, which contains PDF and mets.xml, with the following command using sword v1:
curl -i --data-binary "@art_4959256494595851995.zip" -H "Content-Disposition: filename=art_4959256494595851995.zip" -H "Content-Type: application/zip" -H "X-Packaging: http://purl.org/net/sword-types/METSDSpaceSIP" -H "X-No-Op: false" -H "X-Verbose: true" -u alaw@vt.edu:password -X POST https://vtechworks.lib.vt.edu/sword/deposit/10919/105038
However, following SWORD 2.0 Profile - Creating a Resource with a Binary File Deposit, if I use:
curl -i --data-binary "@SV.2021.21548575.zip" -H "Content-Disposition: filename=SV.2021.2154857.zip" -H "Content-Type: application/zip" -H "X-Packaging: http://purl.org/net/sword-types/METSDSpaceSIP" -H "X-No-Op: false" -H "X-Verbose: true" -u alaw@vt.edu:password -X POST https://vtechworks.lib.vt.edu/swordv2/collection/10919/105038
the file is deposited but the metadata is not parsed. I do not know why.
Of our other SWORD depositors, two use SWORD v1 and deposit a zip containing the PDF and mets.xml.
There is one SWORD v2 depositor who deposits the zip and an extra XML file containing metadata. I believe they are using the Atom Multipart Deposit, SWORD 2.0 Profile - Creating a Resource with a Multipart Deposit.
Since we only receive the documents, I do not know the details of their implementations.
I sent a query to the DSpace tech listserv, SWORD v2 zip submission fails to parse mets.xml.
I haven't found any difference in the METS between SWORD v1 or v2. You might try changing the packaging header directive to 'Packaging' instead of 'X-Packaging'. Apparently -H "Packaging:...." was required for v2. On 6.3 we've had success with:
/usr/bin/curl --basic --user myn...@mit.edu:$mypass -i -T "./PhysRevB.99.075430-mets.zip" -H "Content-Disposition:attachment; filename=PhysRevB.99.075430-mets.zip" -H "Content-Type:application/zip" -H "Packaging:http://purl.org/net/sword/package/METSDSpaceSIP" -H "X-No-Op:false" -vvv -X POST https://dspace.mit.edu/swordv2/collection/1721.1/121131
I also tested v2 on beta dspace 7.* a while back and that worked as well.
Hopefully that helps.
Carl
Carl,
Thank you very much for your help which resolved my issue. Indeed, -H "Packaging:http://purl.org/net/sword/package/METSDSpaceSIP" seems to be required for SWORDv2 and -H "X-Packaging: http://purl.org/net/sword-types/METSDSpaceSIP" is required for SWORD. X-Packaging and the URLs are different.
So, to summarize:
curl -i --data-binary "@art_4959256494595851995.zip" -H "Content-Disposition: filename=art_4959256494595851995.zip" -H "Content-Type: application/zip" -H "X-Packaging: http://purl.org/net/sword-types/METSDSpaceSIP" -H "X-No-Op: false" -H "X-Verbose: true" -u email@vt.edu:password -X POST https://vtechworks.lib.vt.edu/sword/deposit/10919/105038
Yields HTTP 202 and correctly parsed metadata.
curl -i --data-binary "@art_4959256494595851995.zip" -H "Content-Disposition: filename=art_4959256494595851995.zip" -H "Content-Type: application/zip" -H "Packaging:http://purl.org/net/sword/package/METSDSpaceSIP" -H "X-No-Op: false" -H "X-Verbose: true" -u email@vt.edu:password -X POST https://vtechworks.lib.vt.edu/swordv2/collection/10919/105038
Yields HTTP 201 and correctly parsed metadata.
The mets.xml file is parsed by DSpace for metadata for both SWORD and SWORDv2. The extra XML file deposited by our SWORDv2 submitter is not parsed upon upload but is made available, e.g. https://vtechworks.lib.vt.edu/handle/10919/105028.
BioMed Central and MDPI use SWORD v1. Hindawi uses SWORD v2.
We received it but only this parsed:
Democratizing Cellular Access with CellBricks
dc.description.provenance | Submitted by ACM SWORD (acmopen@hq.acm.org) on 2021-10-05T20:32:16Z No. of bitstreams: 2 3452296.3473336.pdf: 1842050 bytes, checksum: 959bbeaedfb1e4248b705cfb21b0bf67 (MD5) 3452296.3473336.zip: 1796525 bytes, checksum: 93acadf9ddbfb2116caa2337b3a9f2ec (MD5) | en |
---|---|---|
dc.title | Democratizing Cellular Access with CellBricks | |
dc.date.updated | 2021-10-05T20:32:16Z |
This one is declared UTF-16 and is saved as UTF-8, so I guess it can work. But almost everything we do and get it UTF-8, so I recommend that.
I think you may have called it with this UTF encoding issue – I’ve made an adjustment to keep it at 8. I’ve also just submitted a paper successfully. How do things look? - Tom
It seems like we are getting the items consistently now. The latest, 3409118.3475142.zip, is declared as UTF-8 and actually is.
Only a few fields in mets.xml are parsed by our crosswalk, sword-swap-ingest.xsl, because only the title, id, and date are in the crosswalk. I suggest using the fields in the crosswalk, as much as possible. You can include other fields and we may modify the crosswalk to utilize them but that wouldn't happen immediately. If you develop a tag set that matches the crosswalk it should work for all DSpace repositories, since this is the default crosswalk that comes with DSpace. I can give you feedback on the mets files you send. I think it might also be possible to use the DSpace 6.3 demo site, https://demo.dspace.org/xmlui/ to test, too. There, you could deposit to a collection and see the submission yourself. It might be instructive to see it from the DSpace side.
Which fields among those you see in the XML do you want conveyed? I ask because there are a number of datapoints that I’ve tried sending that don’t have representation on the sword-swap-ingest page you linked me to. Data points such as DOI, which eRights form was selected, an array of author information, the paper’s publisher, I could go on.
Is this something you encounter often or am I missing something? - Tom
In general, we want as much metadata as possible. If a metadata value can't be added with the tags in sword-swap-ingest.xsl, it is fine to add them and we'll try to improve that crosswalk to map them later. It would be great if the extra tags matched those sent by other vendors which are listed in issue, #720.
I have attached an annotated mets.xml file for 3409118.3475142.zip with our suggestions for tagging.
Also, you can use the two zip files from the other vendors that I sent you and the third one attached as examples.
When I referred to dc.title, that is the destination field for the title. The BiomedCentral sample attached, art_4959256494595851995.zip, was deposited in VTechWorks at https://vtechworks.lib.vt.edu/handle/10919/78663?show=full.
The mets.xml file references the epdcx (ePrints Dublin Core) metadata schema
xmlns:epdcx="http://purl.org/eprint/epdcx/2006-11-16/"
which defines this field.
The mets.xml file is processed by sword-swap-ingest.xsl which also references
xmlns:epdcx="http://purl.org/eprint/epdcx/2006-11-16/"
All the fields will need to be sent in the form of the BioMedCentral example file. I suggest just adding one field to it and making sure that works first, perhaps
<epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/creator">
<epdcx:valueString>Pancotto, Theresa E</epdcx:valueString>
</epdcx:statement>
Tom Gibson at ACM is having trouble sending SWORD v2 submissions to us:
-i --data-binary ""@/var/www/cfapi/repositoryManagement/files/3442381.3450060/3442381.3450060.zip"" -H ""Content-Disposition: attachment; filename=3442381.3450060.zip"" -H ""Packaging: http://purl.org/net/sword/package/METSDSpaceSIP"" -u acmopen@hq.acm.org:password -X POST https://vtechworks.lib.vt.edu/handle/10919/105038