SorenKarst / longread_umi

GNU General Public License v3.0
76 stars 28 forks source link

Correct '-q' value for longread_umi nanopore_pipeline #34

Closed splaisan closed 4 years ago

splaisan commented 4 years ago

The help page of 'longread_umi nanopore_pipeline' says

-q Medaka model used for polishing. r941_min_high, r10_min_high etc.

but the value 'r941_min_high' does not seem to exist in medaka consensus

medaka consensus -h
...
  --model MODEL         Model definition, default is equivalent to
                        r941_min_high_g344. {r941_min_fast_g303,
                        r941_min_high_g303, r941_min_high_g330,
                        r941_min_high_g344, r941_prom_fast_g303,
                        r941_prom_high_g303, r941_prom_high_g344,
                        r941_prom_high_g330, r10_min_high_g303,
                        r10_min_high_g340, r103_min_high_g345,
                        r941_prom_snp_g303, r941_prom_variant_g303,
                        r941_min_high_g340_rle} (default: /opt/biotools/minico
                        nda3/envs/longread_umi/lib/python3.6/site-
                        packages/medaka/data/r941_min_high_g344_model.hdf5)

I took 'r941_min_high_g344' which seems to work with the current version It seems we cannot omit '-q ' to get the default medaka value? Did I use the right -q for a flongle flow-cell base-called in hac with guppy 3.2.10 (on gridion)?

SorenKarst commented 4 years ago

Hi splaisen,

Sorry for the late reply.

The help text refers to an older version of medaka, where the models had different names. To be honest I do not know if you used the correct model version for your data. As far as I can see there is no "g321" model in the list. My guess would be the "g303" would be the best fit, since new models are made when substantial changes have been made to the basecaller output which seems to be at version 3.0.3 and version 3.3.0.

I would recommend re-basecalling your data with the newest version of guppy and update your medaka in the longread_umi conda environment to the newest version to get the benefit of the increased accuracy that is possible today.

splaisan commented 4 years ago

Thanks @SorenKarst , will do. I wished the choice of model would be a little more guided. I'll see if I can post on the medaka page about that to get their feedback.

They have nice circular posts there about this topic which are all closed without a real answer, nice (https://github.com/nanoporetech/medaka/issues/156 <=> https://github.com/nanoporetech/medaka/issues/169) :-) Best