In Novaseq runs there are often far too many reads. According to my tests, 5-10 million reads for a run-replicate (ca. 96 samples) is enough. Above this number of reads the number of variants, average number of variants and reads per sample will not increase after the vtam filtering.
On the other hand, too many reads increase run time and can cause memory issues.
It would be nice to have either a separate command (after merge) or an option in merge to randomly select a user-defined number of reads from each output file of the merge. These reads will be the input of sortreads.
In Novaseq runs there are often far too many reads. According to my tests, 5-10 million reads for a run-replicate (ca. 96 samples) is enough. Above this number of reads the number of variants, average number of variants and reads per sample will not increase after the vtam filtering. On the other hand, too many reads increase run time and can cause memory issues. It would be nice to have either a separate command (after merge) or an option in merge to randomly select a user-defined number of reads from each output file of the merge. These reads will be the input of sortreads.