bihealth / sodar-server

SODAR: System for Omics Data Access and Retrieval
https://github.com/bihealth/sodar-server
MIT License
14 stars 3 forks source link

Multi-line study protocol description breaks ISA-Tab export #1683

Open mikkonie opened 1 year ago

mikkonie commented 1 year ago

We noticed that a specific ISA-Tab becomes broken when importing it into SODAR, exporting it and attempting re-import on the exported file.

The error in question is as follows:

ISA-Tab import failed: Missing entries in section STUDY PROTOCOLS; only found: ['Study Protocol Description', 'Study Protocol Name', 'Study Protocol Type', 'Study Protocol Type Term Accession Number', 'Study Protocol Type Term Source REF']

Obviously this has to do with the protocol seciton. We need to check if this error occurs in import or export and how to reverse it. Since this data is from a public source, I will place the affected sheets (both original and broken) here:

MTBLS691_compressed_files_ORIGINAL.zip MTBLS691_compressed_files_EXPORT.zip

Edit: It appears the reason behind this is a multi-line study protocol description field. Exporting adds backlash characters after each row (broken conversion of a newline at some point?) Also, double quotes seem to be missing from the export.

The good news is, all the relevant data is still intact. The next task is to see if this happens in import or export, and how to fix it.

mikkonie commented 10 months ago

AFAIK, we've only encountered this in one publicly available data set and the import can be fixed by altering the description. Hence, I'm postponing this to the next milestone. However, if this poses a serious problem for someone, please let me know and I'll bump up the prioritization.