Problem with sample parsing in ENA study

ISA-tools / xslt2isa

xsl transformation from various XML format to ISA-Tab

GNU Lesser General Public License v3.0

0 stars 0 forks source link

Problem with sample parsing in ENA study #3

Open agbeltran opened 9 years ago

agbeltran commented 9 years ago

splitting comma separated values is not working

agbeltran commented 9 years ago

@proccaserra please, remember to provide the SRA accession number to test this. Thanks!

proccaserra commented 9 years ago

ERP005654:

http://www.ebi.ac.uk/ena/data/view/ERP005654&display=xml

samples available from: http://www.ebi.ac.uk/ena/data/view/ERS467753-ERS470752,ERS804444-ERS804467&display=xml (warning: this is large)

experiments available from: http://www.ebi.ac.uk/ena/data/view/ERX1054775-ERX1054904,ERX1054908-ERX1055078,ERX562030-ERX570529,ERX570536-ERX575535,ERX576012-ERX583733,ERX586362-ERX587861,ERX587863-ERX588862,ERX591554-ERX592053,ERX593142-ERX593641,ERX594567-ERX594775,ERX617918-ERX618214&display=xml

The xslt stops after processing the first set of experiments (ERX1054775-ERX1054904). (it requires saxon 9 PE to run)

proccaserra commented 9 years ago

in https://github.com/ISA-tools/xslt2isa/blob/master/sra/extract-studies-rice.xsl at line 39, the split on the comma does not seem to work, which means that only ERX1054775-ERX1054904 (the first element in the list) are processed when creating the ISA assays.

You can run the code from oxygen using blank.xml as input or from the command line as detailed in the readme. It is a pain to trace owing to the lack of error message, meaningful output.

djcomlab commented 9 years ago

Which readme? Do you mean the command.txt inside xslt2isa/sra?

proccaserra commented 9 years ago

yes, you need saxon9 but it is available from Oxygen

proccaserra commented 9 years ago

the problem may have to do with updates to assay output file for assays of the same types but obtained from different blocks of records. processing SRP000198 which references 2 blocks of experiments [SRX000350-SRX000352,SRX001866-SRX001868], pull the first samples and one of the second block.

we may need to have a closer look at this bit of the transform

djcomlab commented 8 years ago

I've pushed a fix that isn't quite ideal as it writes out the comma-separated groups of experiments/assays to different a_* files, but at least it grabs and converts the right data.

A similar fix I already pushed to the isa-api project but it deals with merging the a_* files in a post-processing step written in Python.