Question about configurations for covid-19 ONT data analysis

qianwang-prenetics commented 2 years ago

Hi Team, I'm very interested in the tool and had tried to use it analyse covid-19 ONT amplicom sequencing data. I found several questions about the configuration during my run:

I selected all module to run the core process, and found below command will chose a model for medaka consensus run. I'm wondering which model should I choose for my run? I'm using MinKNOW with fast basecalling mode, attached a version info below. medaka consensus --model r941_min_high_g360 --threads 5 --chunk_len 800 --chunk_ovlp 400 --RG 2 barcode01_barcode01.trimmed.rg.sorted.bam barcode01_barcode01.2.hdf 13.429641861468554
Second question is about align_trim. May I know is this step necessary within the whole pipeline? if not, can I skip this step and how to skip it? I found current setting will downsample the reads to 200 (--normalise 200), may I know how to set this parameters in the config file?
Third question is about the filtering criteria when passing merged.vcf to pass.vcf. Can you explain more about this filtering? Looking forward to your reply, thanks!!

emiracherif commented 2 years ago

Hi Qian, Thank you for your interest in our tools and sorry for the late answer. I was out of the office. So for your first question: if you've used the MinIon device and the fast basecalling, you need to choose the r941_min_fast_g4XX. You can have the list of models by running this cmd: "medaka tools list_models" and choose the right one for you. r941 is for the flowcell, R9/ min is for the Minion device/fast is for the fast basecalling and the gxxx is for the guppy version. Your second and third questions are more linked to the artic pipeline configuration. So I strongly advise you to read the artic doc. Anyhow, if you want to change the "--normalise 200" parameter, you will need to change the .smk file directly. For exemple, line 115 of the ontdeCIPHER_Q5.smk file. I hope this will help you, And please feel free if you have other questions. Very best Emira

mohammadsalma commented 2 years ago

Hi Qian, Sorry for the late answer. Thank you Emira for your answer. We are going to push an update by the end of the week to add "--normalise " option to the config file. If you have other suggests, please don't hesitate :)

Best, Mohammad

qianwang-prenetics commented 2 years ago

Hi Emira,

Thanks for your suggestion! I have tried cmd medaka tools list_models to check available models, but seems no r941_min_fast_g4XX available? Can you help to take a look? Thanks!! Available: r103_min_high_g345, r103_min_high_g360, r103_prom_high_g360, r103_prom_snp_g3210, r103_prom_variant_g3210, r10_min_high_g303, r10_min_high_g340, r941_min_fast_g303, r941_min_high_g303, r941_min_high_g330, r941_min_high_g340_rle, r941_min_high_g344, r941_min_high_g351, r941_min_high_g360, r941_prom_fast_g303, r941_prom_high_g303, r941_prom_high_g330, r941_prom_high_g344, r941_prom_high_g360, r941_prom_snp_g303, r941_prom_snp_g322, r941_prom_snp_g360, r941_prom_variant_g303, r941_prom_variant_g322, r941_prom_variant_g360 Default consensus: r941_min_high_g360 Default snp: r941_prom_snp_g360 Default variant: r941_prom_variant_g360 Best regards, Qian

On 14 Apr 2022, at 3:34 PM, Emira CHERIF @.***> wrote:

Hi Qian, Thank you for your interest in our tools and sorry for the late answer. I was out of the office. So for your first question: if you've used the MinIon device and the fast basecalling, you need to choose the r941_min_fast_g4XX. You can have the list of models by running this cmd: "medaka tools list_models" and choose the right one for you. r941 is for the flowcell, R9/ min is for the Minion device/fast is for the fast basecalling and the gxxx is for the guppy version. Your second and third questions are more linked to the artic pipeline configuration. So I strongly advise you to read the artic doc. Anyhow, if you want to change the "--normalise 200" parameter, you will need to change the .smk file directly. For exemple, line 115 of the ontdeCIPHER_Q5.smk file. I hope this will help you, And please feel free if you have other questions. Very best Emira

— Reply to this email directly, view it on GitHub https://github.com/emiracherif/ONTdeCIPHER/issues/2#issuecomment-1098799890, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKUHPA3BSQFNGCPRQN7IQN3VE7DA3ANCNFSM5SURMF2Q. You are receiving this because you authored the thread.

emiracherif commented 2 years ago

This is maybe linked to the version of medaka. So you can use r941_min_fast_g360. But before, try the cmd "medaka consensus -h" and check for the available models Best Emira

qianwang-prenetics commented 2 years ago

I don’t have exact r941_min_fast_g360, should I use r941_min_fast_g303instead of the default model r941_min_high_g360?

On 14 Apr 2022, at 5:31 PM, Emira CHERIF @.***> wrote:

This is maybe linked to the version of medaka. So you can use r941_min_fast_g360. But before, try the cmd "medaka consensus -h" and check for the available models Best Emira

— Reply to this email directly, view it on GitHub https://github.com/emiracherif/ONTdeCIPHER/issues/2#issuecomment-1098929283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKUHPA3GTBBRXAGAG6YNRG3VE7QYDANCNFSM5SURMF2Q. You are receiving this because you authored the thread.

emiracherif commented 2 years ago

can you check the version of medaka --version And check with the cmd "medaka consensus -h." If you didn't find anything, use r941_min_fast_g303. Finally, a piece of advice: if you can try to redo the basecalling using the high accuracy model of guppy.

qianwang-prenetics commented 2 years ago

My mekada version is medaka 1.0.3 Thanks for your suggestion, I will try to re-run the base calling step. Another question is if we don’t active the fast mode in the sequencer, will the fastq data called by high accuracy guggy? Or is it always better to do the base calling offline?

On 14 Apr 2022, at 5:46 PM, Emira CHERIF @.***> wrote:

medaka consensus -h

mohammadsalma commented 2 years ago

I think if you select a model which is not already installed, Medaka will download it automatically. Could you give it a try, please?

Best,

emiracherif commented 2 years ago

Hi Qian

"Another question is if we don’t active the fast mode in the sequencer, will the fastq data called by high accuracy guggy? Or is it always better to do the base calling offline?" If you disable the basecalling, you will just have the fast5 files. So, it's better to run your sequencing with the fast basecalling so you can immediately do the first checks on your results. Then redo the high accuracy basecalling. Very best Emira

qianwang-prenetics commented 2 years ago

Thanks so much for your suggestion! I tried to run medaka with model matching my guppy version, but it failed to run the medaka, here is the error message...

On 14 Apr 2022, at 10:15 PM, Mohammad Salma @.***> wrote:

I think if you select a model which is not already installed, Medaka will download it automatically. Could you give it a try, please?

Best,

— Reply to this email directly, view it on GitHub https://github.com/emiracherif/ONTdeCIPHER/issues/2#issuecomment-1099232174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKUHPA7RSOU72X5PHP6Y7X3VFAR7XANCNFSM5SURMF2Q. You are receiving this because you authored the thread.

mohammadsalma commented 2 years ago

Hi Qian, Sorry but I can't see the error message.

Thanks so much for your suggestion! I tried to run medaka with model matching my guppy version, but it failed to run the medaka, here is the error message... … On 14 Apr 2022, at 10:15 PM, Mohammad Salma @.***> wrote: I think if you select a model which is not already installed, Medaka will download it automatically. Could you give it a try, please? Best, — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKUHPA7RSOU72X5PHP6Y7X3VFAR7XANCNFSM5SURMF2Q. You are receiving this because you authored the thread.

qianwang-prenetics commented 2 years ago

Hi Qian, Sorry but I can't see the error message.

Thanks so much for your suggestion! I tried to run medaka with model matching my guppy version, but it failed to run the medaka, here is the error message... … On 14 Apr 2022, at 10:15 PM, Mohammad Salma @.***> wrote: I think if you select a model which is not already installed, Medaka will download it automatically. Could you give it a try, please? Best, — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKUHPA7RSOU72X5PHP6Y7X3VFAR7XANCNFSM5SURMF2Q. You are receiving this because you authored the thread.

Can you see the screenshot now?

mohammadsalma commented 2 years ago

Yes thanks :) Do you have any log file for this step of the pipeline in the Logs directory? I mean a file like that *_artic_medaka.log

Hi Qian, Sorry but I can't see the error message.

Thanks so much for your suggestion! I tried to run medaka with model matching my guppy version, but it failed to run the medaka, here is the error message... … On 14 Apr 2022, at 10:15 PM, Mohammad Salma @.***> wrote: I think if you select a model which is not already installed, Medaka will download it automatically. Could you give it a try, please? Best, — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKUHPA7RSOU72X5PHP6Y7X3VFAR7XANCNFSM5SURMF2Q. You are receiving this because you authored the thread.

Can you see the screenshot now?

qianwang-prenetics commented 2 years ago

Hi Emira,

I have tried to run base calling by the latest version of guppy, and got similar number of fastq data. I also tried to re-run ontdecipher based on the new fastq data, and got similar results. I’m wondering how should I evaluate the result generated by the new guppy and whether it is necessary to re-run the base calling since this step is resources consuming. Thanks!

Best Regards, Qian

On 14 Apr 2022, at 5:46 PM, Emira CHERIF @.***> wrote:

can you check the version of medaka --version And check with the cmd "medaka consensus -h." If you didn't find anything, use r941_min_fast_g303. Finally, a piece of advice: if you can try to redo the basecalling using the high accuracy model of guppy.

— Reply to this email directly, view it on GitHub https://github.com/emiracherif/ONTdeCIPHER/issues/2#issuecomment-1098947111, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKUHPA2MRWMZOWE6VHQXF6DVE7SNTANCNFSM5SURMF2Q. You are receiving this because you authored the thread.

emiracherif / ONTdeCIPHER

Question about configurations for covid-19 ONT data analysis #2