ISA-tools / isa-api

ISA tools API
https://isa-tools.org
Other
40 stars 37 forks source link

json2sra conversion produces unpredictable output on BII-S-3 #129

Closed djcomlab closed 7 years ago

djcomlab commented 7 years ago

I've been wrestling with writing appropriate tests for the SRA conversion (`json2sra), and found that the Java SRA converter does not always output the same XMLs with the same input.

I've been running the SRA converter with BII-S-3, and on some runs the attribute existing_study_type in the study.xml file of the SRA output is set to "Other" rather and in some other runs it is set to "Transcriptome Analysis".

This may be a bug in the SRA converter, but might also be in the json2isatab part of the conversion pipeline.

proccaserra commented 7 years ago

I am wondering if this is related the order in which the converter would invoke the SRA converter, starting either the genomics or the transcriptomics assay (as BII-S-3 has 2 assay types). as SRA study type accepts only one value (AFAIK) , this may be the cause of the problem.

djcomlab commented 7 years ago

Ah yes that might be it, thanks! I'll have a look for this.

djcomlab commented 7 years ago

So, the SRA converter still writes out XML for samples and files for both assay types anyway, is that correct? It looks like it includes everything. But I guess if there's more than one assay type it and only accepts one, it can't resolve that particular value perhaps.

djcomlab commented 7 years ago

Look like you're right @proccaserra - if I change the other BII-S-3 assay's measurementType from "metagenome sequencing" to "transcription profiling", the behaviour stays consistent. So this must be a bug in the Java SRA code.

proccaserra commented 7 years ago

from SRA 1.5 study xsd: ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.study.xsd:

xs:annotation xs:documentation The STUDY_TYPE presents a controlled vocabulary for expressing the overall purpose of the study. /xs:documentation /xs:annotation xs:complexType #not_ideal ! what are users supposed to do when their study has 2 or 3 assay types (RNA-Seq, Chip-Seq,Targeted Gene survey) -> submit 3 distinct studies, and lose of the relation between samples and libraries ?
proccaserra commented 7 years ago

could use 'other' or 'new_study_type' with value 'multiomics'

' xs:annotation xs:documentation To propose a new term, select Other and enter a new study type. /xs:documentation /xs:annotation /xs:attribute

djcomlab commented 7 years ago

Is it worth asking the SRA guys for guidance on this, as they would be the ones receiving the submissions?

djcomlab commented 7 years ago

To quote info from ENA datasubs "Study type should be ignored. This field is contained in the legacy ERP study".

And actually alsostudy.xml should be replaced with a project_set.xml that does not store any study type anyway.

djcomlab commented 7 years ago

Fix pushed to develop, closing.