PacificBiosciences / pb-metagenomics-tools

Tools and pipelines tailored to using PacBio HiFi Reads for metagenomics
BSD 3-Clause Clear License
174 stars 35 forks source link

bug fix for large refs #62

Closed dportik closed 1 year ago

dportik commented 1 year ago

Given a reference longer than 4Gb, minimap2 is unable to see all the sequences and thus can't produce a correct SAM header. This throws an error in the MinimapToBam rule with the sorting step. This fix avoids the issue by splitting this rule into two new rules: MinimapIndex and MinimapToBam. These rules index the reference prior to alignment and run minimap2 with the --split-prefix option, respectively.

dportik commented 1 year ago

Requires testing on datasets which do not exceed the max reference size (and therefore would produce correct SAM headers).

dportik commented 1 year ago

Passed with test datasets