Closed FerriolCalvet closed 11 months ago
@FerriolCalvet my apologies if I am missing it, but I don't see any sorting message in the logs. Can you show me where that's occuring?
Sorry I saw that it had the same format as in a previous case where I provided an unsorted input, and I was a bit confused. But I rechecked that again and now I see that it is directly writing it without sorting. Apologies for the confusion, and thank you for your reply!
I am using the fgbio suite of tools for processing some duplex sequencing data, and I am seeing that the GroupReadsByUMI step is among the ones that take more time to run, and it cannot be parallelized. I realised that one of the first things that the program does if it detects that the input is not
template-coordinate
sorted is to start sorting it, so I decided to sort it outside withsamtools
, in such a way that I can at least parallelize the sorting procedure. But even after providing the input in this format it still complains and sorts again the entire file... I wonder if there is another way in which I could sort the input file or there is any parameter or anything that influences this.I am attaching some information:
the header of the BAM file provided as input
the command I use to run GroupReadsByUMI
the first lines of the log file