bigbio / proteomics-sample-metadata

The Proteomics Experimental Design file format: Standard for experimental design annotation
GNU General Public License v2.0
75 stars 106 forks source link

SDRF file format problem (px-submission-tool vs. sdrf-pipeline) #488

Open ovigy opened 3 years ago

ovigy commented 3 years ago

Hi All,

related to the PXD022713 dataset I submitted last week (private for the moment), I encountered a problem with the SDRF file I tried for the first time to generate and submit (as suggested by the px-submission-tool v2.5.2).

Could you please help me in understanding what was wrong in the format? I used the Python "sdrf-pipeline" to validate it upstream from the PRIDE submission.

I used the "sdrf-default.tsv" file to fill out the sample and file annotations. I had to duplicate the "characteristics[organism]" column to be able to define two species (hopping it's the right way to do, I can't catch a related example). I also have to change the instrument annotation by free text while it was ok to use the ontology url with the "sdrf-pipeline".

I join two files (renamed .txt for joining) : the one validated by the Python script and the one submitted "SDRF_ND9.tsv" (as "other" and not "experimental design" otherwise an error was still dropped in the log) :

sdrf-pipeline

parse_sdrf validate-sdrf --sdrf_file SDRF_ND9_toValidate.tsv Everything seems to be fine. Well done.

px-submission log

2020-11-24 12:43:24,054 INFO [pool-1-thread-8] u.a.e.p.s.v.Main [Main.java:107] ERROR : The number of columns in the SDRF ({}) is smaller than the number of mandatory fields ({})', value='', row=0, column='N/A' 2020-11-24 12:43:24,072 INFO [pool-1-thread-8] u.a.e.p.s.v.Main [Main.java:107] ERROR : Invalid columns present: name, experiment, fraction ', value='', row=0, column=' name, experiment, fraction' 2020-11-24 12:43:24,072 INFO [pool-1-thread-8] u.a.e.p.s.v.Main [Main.java:107] ERROR : The following columns are mandatory and not present in the SDRF: source name, characteristics[organism part], characteristics[disease], characteristics[organism], characteristics[cell type], assay name, comment[fraction identifier], comment[data file]', value='', row=0, column='N/A'

I also submitted another dataset PXD022725 with the same problem.

SDRF_ND9.txt SDRF_ND9_validatedBySDRF.txt

Thanks for your help. Oana

daichengxin commented 3 years ago

Hi ovigy, the px-submission-tool use jsdrf to verify sdrf files. I tested SDRF_ND9.tsv file with sdrf-pipelines, Everything seems to be fine. But there is a problem with jsdrf verification . @ovigy

ypriverol commented 3 years ago

Can you explain @daichengxin what is the error?

daichengxin commented 3 years ago

I may need to look at the jsdrf code, now I can’t see what fields are missing in this file @ypriverol

ovigy commented 3 years ago

Hi! Thanks for your replies. I tried first to use the Java tool but the user documentation was to light for me, sorry I'm not familiar with Maven and java libraries. I will give it a new try. It was easier for me with Python. Thanks

ypriverol commented 3 years ago

thanks @ovigy , we will review the code and go back to you soon.

daichengxin commented 3 years ago

Thanks for your feedback @ovigy , the jsdrf has bugs and will be fixed

ovigy commented 3 years ago

You're welcome. Thanks in return for the documentation and your help. Have a good day