Closed cessda-bitbucket-importer closed 1 year ago
Original comment by Taina Jääskeläinen.
Katja tested the same UniData endpoint record by changing it to xml and the only constraints she got were:
'/codeBook/@xsi:schemaLocation' is mandatory
'/codeBook/stdyDscr/citation/titlStmt/IDNo/@agency' is mandatory
@alexander-muehlbauer So is it still so that https://cmv.cessda.eu/#!validation only checks the xml? If so, why is there a URL link?
Original comment by John Shepherdson (GitHub: john-shepherdson).
I used CMV and pasted the above Nesstar URL in (note that it is shown in abbreviated form in the first screenshot.
Original comment by John Shepherdson (GitHub: john-shepherdson).
Same constraint violations as before.
Original comment by John Shepherdson (GitHub: john-shepherdson).
RE ‘Access data’ button is missing. See also https://github.com/cessda/cessda.metadata.office/issues/22
Original comment by Taina Jääskeläinen.
Trying my best to figure this out.
Attaching the xml file Katja was using for the same UniData record. It seems that if you need to convert an OAI-PMH record to an XML file, you need to 1) remove some stuff from the beginning and the end, and 2) add some stuff to the beginning to make it a valid xml file. Then it can be properly validated, and then get the two violations Katja mentioned.
But this is cumbersome , so therefore I’m asking whether the ‘By URL’ option is working at all for validation in CMV. So there are two issues here:
At least the constraint violations Katja got were correct:
For the other violations, looking at the metadata, I could not figure out why they get these constraints.
Original comment by Taina Jääskeläinen.
<div class="preview-container wiki-content"><!-- loaded via ajax --></div>
<div class="mask"></div>
</div>
Original comment by John Shepherdson (GitHub: john-shepherdson).
The point is that you can now validate an OAI-PMH record by using the ‘Paste URL’ option, which is what the screenshots above show. Prior to that, you could convert an OAI-PMH record to an XML file by removing the OAI-PMH envelope etc, as described above by Taina, but that is no longer necessary - use the ‘Paste URL’ option instead.
Original comment by John Shepherdson (GitHub: john-shepherdson).
I don’t think it is worth spending any time on the differences in constrain violations between an 'original' XML file and one that has been converted from a harvested OAI-PMH record. From here on outwards, we will be conducting bulk validation on harvested OAI-PMH records, so need to make sure that we are getting the expected results from that approach. Given that we have more than 30,000 records to validate (and revalidate), an automated approach is the only feasible option.
Original comment by Taina Jääskeläinen.
Yes, I agree. Just have to make sure the validator gives correct results.
Original comment by John Shepherdson (GitHub: john-shepherdson).
In which case we need to find more examples of false positives regarding these constraint violations.
Then work out if the profile is incorrect, or the implementation of CMV is incorrect.
Original comment by Taina Jääskeläinen.
@alexander-muehlbauer , for your information:
Alex, CESSDA MO is using the current validator to check endpoint records. As the record taken as an example in this issue is an endpoint record, it would really good if you could check that the constraint violations given by the validator are correct.
Original comment by Taina Jääskeläinen.
UniData is switching to use Kuha2 and this issue may not be relevant anymore.
References UniData's old NESSTAR endpoint which has now been replaced. The example document is no longer available.
Original report on BitBucket by Taina Jääskeläinen.
I tested a UniData document in the validator. It is a Nesstar export. Chose Basic gate, 1.2.2. monolingual profile and BY_URL, pasting the URL
https://nesstar.unidata.unimib.it/oai-pmh/?verb=GetRecord&identifier=http://10.99.177.210:80/obj/fStudy/SN147&metadataPrefix=oai_ddi
Constraint Violations
'/codeBook/@xml-lang' is mandatory
'/codeBook/@xsi:schemaLocation' is mandatory
'/codeBook/stdyDscr/citation/titlStmt/titl' is mandatory
'/codeBook/stdyDscr/citation/titlStmt/IDNo' is mandatory
'/codeBook/stdyDscr/citation/titlStmt/IDNo/@agency' is mandatory
'/codeBook/stdyDscr/citation/holdings/@URI' is mandatory
'/codeBook/stdyDscr/citation/distStmt/distrbtr' is mandatory
'/codeBook/stdyDscr/stdyInfo/abstract' is mandatory
But there seems to be title, holdings/@URI, distrbtr and abstract in the record. Did I do something wrong or is something else the problem?
@john-shepherdson Just putting John here so he can follow the issue. John: Why is there no ‘Access data’ button for this dataset in CDC since the URI is there? Is this a related issue or something I should note in the CDC issue tracker?