WGLab / NanoCaller

Variant calling tool for long-read sequencing data
MIT License
90 stars 8 forks source link

Applications #45

Open QuentinPerriere opened 2 months ago

QuentinPerriere commented 2 months ago

Hello, I hope that you can help me with this :

Can I use nanocaller on my fastq.gz files generated using ONT minion technology ? Can I use it to detect variants in fungus ?

kaichop commented 2 months ago

yes it can work but you may need to adjust ploidy setting

On Thu, May 2, 2024 at 6:23 AM QuentinPerriere @.***> wrote:

Hello, I hope that you can help me with this :

Can I use nanocaller on my fastq.gz files generated using ONT minion technology and a flow Cell R10.4.1 ? Can I use it to detect variants in fungus ?

— Reply to this email directly, view it on GitHub https://github.com/WGLab/NanoCaller/issues/45, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OHXFI4A5KLIVU2I443ZAIHZJAVCNFSM6AAAAABHDN4OU2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TKMJUGEZTANY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

QuentinPerriere commented 2 months ago

yes it can work but you may need to adjust ploidy setting On Thu, May 2, 2024 at 6:23 AM QuentinPerriere @.> wrote: Hello, I hope that you can help me with this : Can I use nanocaller on my fastq.gz files generated using ONT minion technology and a flow Cell R10.4.1 ? Can I use it to detect variants in fungus ? — Reply to this email directly, view it on GitHub <#45>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OHXFI4A5KLIVU2I443ZAIHZJAVCNFSM6AAAAABHDN4OU2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TKMJUGEZTANY . You are receiving this because you are subscribed to this thread.Message ID: @.>

thank you , which parameter exactly I have to take into account ?

umahsn commented 2 months ago

You can use --haploid_genome to run haploid models.

emilydolivo97 commented 2 months ago

You can use --haploid_genome to run haploid models. soeey for interferring in this issue: In the case where I have diploid organism does it cause problem ? hoow can I set the ploidy ?

QuentinPerriere commented 2 months ago

You can use --haploid_genome to run haploid models. soeey for interferring in this issue: In the case where I have diploid organism does it cause problem ? hoow can I set the ploidy ?

I think that by default it's haploid so when u use this argument , nanocaller will no longer considerate it as haploid. @umahsn correct me if I'm wrong please

umahsn commented 2 months ago

By default NanoCaller assumes diploid genome for all chromosomes if no ploidy is specified. If you use --haploid_genome flag then it will use haploid model and genotype predictions. We suggested using haploid model assuming your fungus sample is in a haploid life cycle. If not, please ignore the --haploid_genome flag and use default parameters.

On Thu, May 2, 2024, 10:21 AM QuentinPerriere @.***> wrote:

You can use --haploid_genome to run haploid models. soeey for interferring in this issue: In the case where I have diploid organism does it cause problem ? hoow can I set the ploidy ?

I think that by default it's haploid so when u use this argument , nanocaller will no longer considerate it as haploid. @umahsn https://github.com/umahsn correct me if I'm wrong please

— Reply to this email directly, view it on GitHub https://github.com/WGLab/NanoCaller/issues/45#issuecomment-2090794186, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIRI4S6WE24PSOOYBNLEMHLZAJKVBAVCNFSM6AAAAABHDN4OU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJQG44TIMJYGY . You are receiving this because you were mentioned.Message ID: @.***>

QuentinPerriere commented 2 months ago

@umahsn , the model was trained on the human genome, whereas in my case, I'm working on fungi. Will this cause any problems? I'm saying this because I'm not detecting the variants that I'm supposed to detect

umahsn commented 1 month ago

Hi,

I tested NanoCaller on fungi dataset and found a problem with calling variants in a relatively lower depth region where the coverage may drop by a factor of 10 or so compared to neighboring few kbp regions. Are you having a similar problem? I am working on a fix for this issue and will make an update soon.

emilydolivo97 commented 1 month ago

Hi,

I tested NanoCaller on fungi dataset and found a problem with calling variants in a relatively lower depth region where the coverage may drop by a factor of 10 or so compared to neighboring few kbp regions. Are you having a similar problem? I am working on a fix for this issue and will make an update soon.

yes exactly I don't find the expected variants.

umahsn commented 1 month ago

I have added an option to disable coverage normalization: --disable_coverage_normalization which is recommended for high coverage samples such as amplicon sequencing or ultra-deep microbial samples if you are using haploid model. The problem may have been happening if there is a candidate site that has, lets say 100X coverage, but within 1-2 kbp there average coverage is 1000X, then NanoCaller haploid model has less confident variant prediction due to relatively lower coverage at that site compared to surrounding region. This is usually very helpful in whole genome sequencing datasets where low coverage regions are usually tandem repeat or low complexity regions where variant calls may not be reliable and coverage normalization takes that into account. However, it may not be necessary for ultra deep coverage samples.

This update is in github repo only so you would need to use git pull to get latest changes. I will add it to the next release if this fixes the problem for you.

emilydolivo97 commented 1 month ago

I have added an option to disable coverage normalization: --disable_coverage_normalization which is recommended for high coverage samples such as amplicon sequencing or ultra-deep microbial samples if you are using haploid model. The problem may have been happening if there is a candidate site that has, lets say 100X coverage, but within 1-2 kbp there average coverage is 1000X, then NanoCaller haploid model has less confident variant prediction due to relatively lower coverage at that site compared to surrounding region. This is usually very helpful in whole genome sequencing datasets where low coverage regions are usually tandem repeat or low complexity regions where variant calls may not be reliable and coverage normalization takes that into account. However, it may not be necessary for ultra deep coverage samples.

This update is in github repo only so you would need to use git pull to get latest changes. I will add it to the next release if this fixes the problem for you.

@umahsn , Thank you for taking the time to answer my question. Based on your previous response, since I'm dealing with fungi (a diploid organism), I don't need to use the parameter --haploid_genome

you answer : By default NanoCaller assumes diploid genome for all chromosomes if no ploidy is specified. If you use --haploid_genome flag then it will use haploid model and genotype predictions. We suggested using haploid model assuming your fungus sample is in a haploid life cycle. If not, please ignore the --haploid_genome flag and use default parameters.

what I should do in this case please ?

umahsn commented 1 month ago

If you are processing a diploid organism, then do not use --haploid_genome parameter, and keep default parameters. Use --disable_coverage_normalization if you sample is processed with amplicon or targeted sequencing. If it is whole genome sequencing then you do not need to use --disable_coverage_normalization.