gbouras13 / hybracter

Automated long-read first bacterial genome assembly tool implemented in Snakemake using Snaketool.
MIT License
107 stars 8 forks source link

Update medaka models to include 4.3.0 models #71

Closed samuelmontgomery closed 2 months ago

samuelmontgomery commented 7 months ago

Hi,

The medaka models used for hybracter only seem to pull v4.2.0 models, when my basecalling has been run using v4.3.0 models I know that I likely don't need medaka (I turn it off) due to my quality scores, but the v4.3.0 models don't seem to be able to be specified by medakaModel Additionally, updating minimap2 to use the lr:hq flag for Q20 nanopore reads would be nice (but not a massive improvement)

Thanks

gbouras13 commented 7 months ago

Hi @samuelmontgomery ,

Thanks for these comments. With Medaka I have made the decision to deprecate it for newer models as a choice (as polishing is not be recommended on v4.3.0 SUP data or later). Therefore, I do not plan to update it past Medaka v1.8.0 inside Hybracter - this is the latest version that hasn't caused major grief with install. Therefore, I don't think you will be able to be download and specify these models - if you really want to I would recommend updating the specific Medaka environment to a newer version of Medaka inside hybracter and modifying the Hybracter code here to accept the model you want https://github.com/gbouras13/hybracter/blob/0acfb01454545116bac91e53db2b162d537f07f6/hybracter/util.py#L217 .

With minimap2, thanks for that - I would guess this is most useful for Plassembler (the step that uses minimap2 I think?). Maybe inside Flye too.

George

I would think that someone with v4.3.0 FAST or HAC data could just use v4.2.0 models reasonably well (though of course I'd just say to rebasecall with SUP!)

gbouras13 commented 2 months ago

Closing in favour of #84