Trailing tab characters

HUPO-PSI / mzTab

mzTab Reporting MS-based Proteomics and Metabolomics Results

https://hupo-psi.github.io/mzTab

37 stars 16 forks source link

Trailing tab characters #181

Closed bernt-matthias closed 4 years ago

bernt-matthias commented 4 years ago

I noticed in the example data https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/gcxgc-ms-example.mztab that lines have trailing tab characters.

There are also lines that only consist of tab characters.

Is this wanted?

sneumann commented 4 years ago

Hi, that was a design decision, since the entire thing should be a valid TSV file, that means all rows need to have the same number of columns. Which parser are you intending to use ? We should collect all successful implementations in our mzTab-M documentation ;-) Yours, Steffen

sneumann commented 4 years ago

Response from @nilshoffmann :

The difference is basically, that any manually edited file has trailing tabs, introduced by LibreOffice/Excel etc. The generated files, e.g. from LDA2 and MS-DIAL do not have the trailing tabs. It is however possible to open both types of files with LibreOffice / EXCEL AND to parse and validate them with the jmzTab-M reference implementation. Best wishes, Nils

sneumann commented 4 years ago

Although it might be that a naive R read.delim() would choke on non-rectangular TSV files.

bernt-matthias commented 4 years ago

Thanks. I thought so. Note that the number of tabs is inconsistent in the example.

Working on Galaxy data types: https://github.com/galaxyproject/galaxy/pull/8109

bernt-matthias commented 4 years ago

One more problem: In the example data https://github.com/HUPO-PSI/mzTab/blob/279a32c85125068b02253ce6c6a5476871d73a9f/examples/1_0-Proteomics-Release/faahKO.mzTab#L58 there are SEH and SME lines in mzTab v1

This should not happen according to the docs: https://github.com/HUPO-PSI/mzTab/tree/master/specification_document-releases/1_0-Proteomics-Release

andrewrobertjones commented 4 years ago

mzTab 1.0 is deprecated for use in metabolomics, so closing this for now