IHEC / ihec-ecosystems

This repo is for code and documentation associated with the ihec-ecosystems working group
Apache License 2.0
5 stars 6 forks source link

NOMe-seq experiments #119

Closed quirinmanz closed 2 years ago

quirinmanz commented 3 years ago

Hi,

I am trying to run the validator on NOMe-Seq experiment metadata files from the DEEP project and it fails. I saw some related posts (#25 #26) but unfortunately, they did not help.

A minimal example XML file NOME_example.xml could look like this: `

<STUDY_REF accession="NOME_test" refcenter="NOME_test" /> <DESIGN> <DESIGN_DESCRIPTION /> <SAMPLE_DESCRIPTOR accession="NOME_test" refcenter="NOME_test" refname="liver male adult (78 years)" /> <LIBRARY_DESCRIPTOR> <LIBRARY_NAME>unspecified</LIBRARY_NAME> <LIBRARY_STRATEGY>NOME-seq</LIBRARY_STRATEGY> <LIBRARY_SOURCE>GENOMIC</LIBRARY_SOURCE> <LIBRARY_SELECTION>unspecified</LIBRARY_SELECTION> <LIBRARY_LAYOUT> <PAIRED NOMINAL_LENGTH="200"/> </LIBRARY_LAYOUT> <LIBRARY_CONSTRUCTION_PROTOCOL>NOME-Seq</LIBRARY_CONSTRUCTION_PROTOCOL> </LIBRARY_DESCRIPTOR> </DESIGN> <PLATFORM> <ILLUMINA> <INSTRUMENT_MODEL>Illumina HiSeq 2500</INSTRUMENT_MODEL> </ILLUMINA> </PLATFORM> </EXPERIMENT> <p></EXPERIMENT_SET>`</p> <p>This obviously fails with the following message: <code>__invalid_xml:../NOME_example.xml [{'-jsonlog': '../log.json', '-out': '../NOME_example.versioned.xml', '-config': 'path/to/ihec-ecosystems/version_metadata/config.json'}, ['../NOME_example.xml']] xml validates [against:schemas/xml/SRA.experiment.xsd]... False [../experiments/reprocessed/NOME_example.xml]</code></p> <p>I have tried to use the library_strategy "OTHER", but this fails with the error: <code>"invalid experiment_type : prevalidation"</code> This is probably because "OTHER" gets returned in all caps from this <a rel="noreferrer nofollow" target="_blank" href="https://github.com/IHEC/ihec-ecosystems/blob/5f0359e53aaa0b0de8c56bab8dbc7e07cd103d99/version_metadata/egautils.py#L33">line</a>. But from this <a rel="noreferrer nofollow" target="_blank" href="https://github.com/IHEC/ihec-ecosystems/blob/5f0359e53aaa0b0de8c56bab8dbc7e07cd103d99/version_metadata/egautils.py#L25">line</a>, I conclude that there are/were efforts to include NOMe-seq.</p> <p>This pull request (#117) includes NOMe-Seq in the SRA.experiment.xsd file, but I think additional modifications to the experiment.json would be necessary. Are you planning to include NOMe-seq or is there a workaround using OTHER?</p> <p>Thanks in advance and for the great documentation!</p> <p>Best, Quirin</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/sitag"><img src="https://avatars.githubusercontent.com/u/9712369?v=4" />sitag</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>@quirinmanz updated <a href="https://github.com/IHEC/ihec-ecosystems/blob/master/version_metadata/egautils.py#L33">https://github.com/IHEC/ihec-ecosystems/blob/master/version_metadata/egautils.py#L33</a> to return lowercase. It should find other in the schema now: <a href="https://github.com/IHEC/ihec-ecosystems/blob/master/schemas/json/2.0/experiment.json#L206">https://github.com/IHEC/ihec-ecosystems/blob/master/schemas/json/2.0/experiment.json#L206</a></p> <p>once we have attributes that we require for nome-seq, we can give nome-seq it's own branch in the schema; right now there isn't anything to check. </p> <p>i have also merged in <a href="https://github.com/IHEC/ihec-ecosystems/pull/117">https://github.com/IHEC/ihec-ecosystems/pull/117</a> </p> <p>you can pass <code>-not-sra-xml-but-try</code>, <code>-skip-updated-xml-validation</code> to skip xml validation (one is pre-validation one is post, you may need both): <a href="https://github.com/IHEC/ihec-ecosystems/blob/master/version_metadata/validate_main.py#L25">https://github.com/IHEC/ihec-ecosystems/blob/master/version_metadata/validate_main.py#L25</a> and <a href="https://github.com/IHEC/ihec-ecosystems/blob/master/version_metadata/validate_experiment.py#L74-L75">https://github.com/IHEC/ihec-ecosystems/blob/master/version_metadata/validate_experiment.py#L74-L75</a> </p> <p>you can also change what xml you use for validation in <a href="https://github.com/IHEC/ihec-ecosystems/blob/master/version_metadata/config.json#L33">https://github.com/IHEC/ihec-ecosystems/blob/master/version_metadata/config.json#L33</a> (you could also change what <code>config.json</code> is passed)</p> <p>let me know if this works, if not then i will look into this more</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/quirinmanz"><img src="https://avatars.githubusercontent.com/u/50946714?v=4" />quirinmanz</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>Thank you very much!</p> <p>I can now successfully validate a NOMe-Seq experiment using <code><LIBRARY_STRATEGY>NOMe-Seq</LIBRARY_STRATEGY></code> and <code><TAG>EXPERIMENT_TYPE</TAG><VALUE>Other</VALUE></code>.</p> <p>Still, I am not sure what EXPERIMENT_ONTOLOGY_CURIE to use since NOMe-seq does not seem to be included in the Ontology for Biomedical Investigations but only in <a href="https://www.ebi.ac.uk/ols/ontologies/efo/terms?short_form=EFO_0008830">Experimental Factor Ontology</a>. Can you help me with this or is it one of the reasons why it has not been included yet?</p> <p>Best, Quirin</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/sitag"><img src="https://avatars.githubusercontent.com/u/9712369?v=4" />sitag</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>@quirinmanz Since there's no spec for NOMe-Seq yet, no matter what you submit, it will not validate against the latest spec (when the NOMEe-seq spec lands it will be 3.0 likely, so your current metadata won't validate against 3.0, but you will still get 2.0 validated as you have now). In the interest of being precise, I recommend that you use the EFO ontology for now. We will need to figure out how to deal with missing term in OBI ontology. </p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>