Closed Asutu closed 5 years ago
Hi Pedro,
Don't worry about this warning -- I suspect it is just due to your input read set having a number of reads that do not have an associated barcode. For reference, I saw this line in a recent run of ARKS:
WARNING:: Your chromium read file has 27759471 read pairs that have barcodes not in the barcode multiplicity file.Cumulative memory usage: 1452348
And there are exactly that number of read pairs that do not have associated barcodes
[lcoombe@hpce705 Tigmint-ARKS]$ gunzip -c chromium.fq.gz |grep "HISEQ" |grep -v "BX:Z:" |wc -l
55518942
[lcoombe@hpce705 Tigmint-ARKS]$ echo $(( 55518942/2 ))
27759471
I do agree that the warning itself is a little bit cryptic and we could be more clear about if the barcode is not in the provided multiplicity file or whether the read pair just doesn't have a barcode at all.
And yes, it is also expected that a good number of reads will be marked as not having a 'good contig'. This can be due to a number of reasons, including both reads in a read pair not mapping to the same contig, or the jaccard index of a read pair not being above the threshold for any contig.
As for your parameters, they look fine to me except you could also try a slightly higher k
-- I haven't run ARKS with a k-mer size of less than 40. I do find that is a good parameter to do a sweep on -- I find a different optimal k depending on the input assembly.
Hope that helps! Lauren
Hi Lauren,
many thanks, it was really helpful. I'm now testing with other ranges of k
to see if there are improvements.
I'll close this issue now as my questions have been addressed.
I'm glad that was helpful! Just a heads up too - we clarified that warning message in 41694a5.
Hi,
I'm running into a warning with arks saying
WARNING:: Your chromium read file has 13071618 read pairs that have barcodes not in the barcode multiplicity file.Cumulative memory usage: 4621292
, but my understanding was that the barcode multiplicity file was generated from the read file itself. I'm probably not understanding something in arks, because this warning is a bit cryptic to me.I'm also seeing that a large chunk of reads are being skipped (discarded?) by arks because apparently they don't have a good contig (
Skipped reads pairs without a good contig: 162242712
). Is this expected by arks? and would it make sense to tune the parameters to include more reads in the analysis?I'm running arks with default parameters except specifying a minimum contig length of 1kb. The full command is:
Thanks, Pedro
arks.log