Open sureshhewabi opened 1 week ago
one issue here is that to fully validate the mzIdentML file you also need the peaklists, e.g. https://github.com/PRIDE-Archive/xi-mzidentml-converter/issues/81
what are peoples views on how to deal with that? Two alternatives would be:
what if we concentrate on (a) 'Validate files of a given folder(Input will be file path)', and this folder must also contain the peaklist files?
This is easiest to do because its most like how the converter already works.
Also, if it just stops after the first error, then that's easier.
Thoughts on this?
@sureshhewabi - https://github.com/PRIDE-Archive/xi-mzidentml-converter/pull/82 - you can take a look at what I've done there
that PR gives a command line validation option.
So, as a first attempt, i think covers 1. (a), (c), (d), (e) to very a limited extent, and (f) above. 1.(b) we could live without in short term. 1 (g), as i read it, isn't really validation but summary stats, these could be got by querying the sqlite DB.
For 2. above, info is printed to standard output, think it currently includes the logging info we usually see from the converter.
Its not extensively tested. It passes the file Diogo provided. It fails the schema invalid Kojak file.
what if we concentrate on (a) 'Validate files of a given folder(Input will be file path)', and this folder must also contain the peaklist files?
This is easiest to do because its most like how the converter already works.
Also, if it just stops after the first error, then that's easier.
Thoughts on this?
Yes I agree with that
I agree with that
good, that's the way it works in that PR
Its not currently rejecting files that don't have the sequences in Seq elements. (That additional requirement of ours.) It means they break later. (Also of no use to PDB-IHN without sequences?) I'll need to change so it rejects these.
Also, I think I've found another requirement specific to our system - that all Modifications have masses given.
Also, I think I've found another requirement specific to our system - that all Modifications have masses given.
hmm, i think we shouldn't add that as a requirement, rather the spectrum viewer is broken in some cases at the moment. (There are other ways the modification masses could be recovered, like the UNIMOD accessions i think.)
[ ] 1. Validation of Crosslinking MzIdentML (mzID) files.
[ ] 2. Generate validation report We can generate a simple report consisting above information
This is a simple start, and let's use this issue to discuss on validation