AdamaJava / adamajava

Other
14 stars 4 forks source link

qprofiler1 / qvisualise - handle new fastq header #336

Closed holmeso closed 1 year ago

holmeso commented 1 year ago

fix(qvisualise/qprofiler): deal with new fastq header and make qvis more robust to deal with these

2 issues here, qprofiler was not aware of this new fastq header format and so each header became a unique instrument (all 28 million of them). Qvisualise would then run out of memory trying to deal with them all. Have added code to properly deal with the new format, in qprofiler, and perhaps more importantly, added some limits to the size of collections that can be handled by qvisualise.

New fastq header: @SRR14585604.8 A00805:41:HMJJWDRXX:1:1101:16929:1000 length=101 which looks like the NCBI header as described here

What qprofiler will now do for this is to count the number of spaces in the header. If there are 2, it will just send the middle part to be analysed. For any other number of spaces, the header will be treated as it was previously.

qvisualise will now run some checks on the input xml file size along with the supplied Xmx parameter, and log if it thinks a memory issue may be encountered. It will also ignore any xml element that has more than 100000 items in it.

Type of change

How Has This Been Tested?

Additional unit test have been added. Code has been run against offending fastq file and has produced satisfactory results

Are WDL Updates Required?

Nope

Checklist: