fix(qvisualise/qprofiler): deal with new fastq header and make qvis more robust to deal with these
2 issues here, qprofiler was not aware of this new fastq header format and so each header became a unique instrument (all 28 million of them). Qvisualise would then run out of memory trying to deal
with them all. Have added code to properly deal with the new format, in qprofiler, and
perhaps more importantly, added some limits to the size of collections that can be handled by
qvisualise.
New fastq header:
@SRR14585604.8 A00805:41:HMJJWDRXX:1:1101:16929:1000 length=101
which looks like the NCBI header as described here
What qprofiler will now do for this is to count the number of spaces in the header. If there are 2, it will just send the middle part to be analysed. For any other number of spaces, the header will be treated as it was previously.
qvisualise will now run some checks on the input xml file size along with the supplied Xmx parameter, and log if it thinks a memory issue may be encountered.
It will also ignore any xml element that has more than 100000 items in it.
Type of change
[X] Bug fix (non-breaking change which fixes an issue)
How Has This Been Tested?
Additional unit test have been added.
Code has been run against offending fastq file and has produced satisfactory results
Are WDL Updates Required?
Nope
Checklist:
[X] My code follows the style guidelines of this project
[X] I have performed a self-review of my own code
[X] I have commented my code, particularly in hard-to-understand areas
[X] My changes generate no new warnings
[X] I have added tests that prove my fix is effective or that my feature works
[X] New and existing unit tests pass locally with my changes
fix(qvisualise/qprofiler): deal with new fastq header and make qvis more robust to deal with these
2 issues here,
qprofiler
was not aware of this newfastq
header format and so each header became a unique instrument (all 28 million of them).Qvisualise
would then run out of memory trying to deal with them all. Have added code to properly deal with the new format, inqprofiler
, and perhaps more importantly, added some limits to the size of collections that can be handled byqvisualise
.New fastq header:
@SRR14585604.8 A00805:41:HMJJWDRXX:1:1101:16929:1000 length=101
which looks like the NCBI header as described hereWhat
qprofiler
will now do for this is to count the number of spaces in the header. If there are 2, it will just send the middle part to be analysed. For any other number of spaces, the header will be treated as it was previously.qvisualise
will now run some checks on the input xml file size along with the supplied Xmx parameter, and log if it thinks a memory issue may be encountered. It will also ignore any xml element that has more than 100000 items in it.Type of change
How Has This Been Tested?
Additional unit test have been added. Code has been run against offending fastq file and has produced satisfactory results
Are WDL Updates Required?
Nope
Checklist: