Closed jomtov closed 1 year ago
Ouch. It's insane that we've only gotten around to fixing it now. Much of these are simply matters of our code writing elements in random order, where the schema defines a<xs:sequence>
- not that difficult to fix.
Though there is a couple of non-trivial things where decisions need to be made; (what to do with the bounding boxes, for example).
Just to clarify a couple of things from an earlier discussion:
sizing:
* We will address the immediate issue of the bad ddi xml exports by looking specifically at what has been reported. ... * If we find that the validator needs work, we will create a new separate issue when this is complete
"Looking specifically at what has been reported" may not easily apply. This is a very old issue, with a lot of back-and-forth (that's very hard to read), and many of the things reported earlier have already been fixed in other PRs. So I assumed that the goal of the PR was "make Dataverse produce valid DDI". (i.e., if something not explicitly mentioned here is obviously failing validation, it needed to be fixed too - it did not make sense to make a PR that would fix some things, but still produce ddi records that fail validation; especially since people have been waiting for it to be fixed since 2017).
The previously discussed automatic validation - adding code to the exporter that would validate in real time every ddi record produced, and only cache it if it passes the validation - does make sense to be left as a separate sprint-sized task. (the validation itself is not hard to add; but we'll meed to figure out how to report the errors). I have enabled the validation test in DDIExporterTest.testExportDataset()
however, so, in the meantime, after we merge this PR, any developer working on the ddi exporter will be alerted if they break it by introducing something invalid, because they won't be able to build their branch.
To clarify, in the current state, the exporter in my branch is producing valid ddi xml for our control "all fields" dataset, plus all the other datasets used in our tests, and whatever I could think of to test. It does NOT guarantee that there is no possible scenario where it can still output something illegal! So, yes, it is important to add auto-validation. And, if and when somebody finds another such scenario, we will treat it as a new issue.
A couple of arbitrary decisions had to be made. I will spell it out in the PR description. My general approach was, if something does not translate from our metadata to the ddi format 1:1, just drop it and move on. We don't assume that it's a goal, to preserve all of our metadata when exporting DC, it's obvious that only a subset of our block fields can be exported in that format. But it's not a possibility with the ddi either, now that we have multiple blocks and the application is no longer centered around quantitative social science. So, no need to sweat a lost individual field here and there.
To check compatibility I use the following two validators:
@kaczmirek CESSDA (https://cmv.cessda.eu/#!validation) is my favorite validator tool as well. I made a pull request the other week (#9484, linked to this issue) that fixes the numerous schema violations in our DDI export. I recommend the CESSDA validator under "how to test" there, with the same profile you mentioned ("CESSDA DATA CATALOGUE (CDC) DDI2.5 PROFILE - MONOLINGUAL: 1.0.4").
Forwarded from the ticket: https://help.hmdc.harvard.edu/Ticket/Display.html?id=245607
Hello, I tried to validate two items exported to DDI from dataverse.harvard.edu with codebook.xsd (2.5) and got the same types of validation errors described below for item1 (below the line, should work as a well-formed xml-file):
Item 1:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BAMCSI
Item 2: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/P4JTOD
What could be done about it (else than meddling with the schema?)
Best regards,
Joakim Philipson Research Data Analyst, Ph.D., MLIS Stockholm University Library
Stockholm University SE-106 91 Stockholm Sweden
Tel: +46-8-16 29 50 Mobile: +46-72-1464702 E-mail: joakim.philipson@sub.su.se http://orcid.org/0000-0001-5699-994X
<?xml version="1.0" encoding="UTF-8"?>
dataverse_1062_philipsonErrorTypes.txt