CDCgov / SARS-CoV-2_Sequencing

A collection of sequencing protocols and bioinformatic resources for SARS-CoV-2 sequencing.
Apache License 2.0
344 stars 83 forks source link

Regarding oxford nanopore data analysis #9

Closed ps120195 closed 4 years ago

dmaccannell commented 4 years ago

Was there a specific issue, or is this more of a philosophical conjecture?

ps120195 commented 4 years ago

Hii,

Please find the attachments. I ran SARS-Cov2 sequencing pipeline for nanopore data, where I am getting two kinds of results. All the commands were same ,even the samples were same,but ran on different systems.

Can you tell why this is happening?

On Tue, Mar 31, 2020 at 2:05 AM Duncan MacCannell notifications@github.com wrote:

Was there a specific issue, or is this more of a philosophical conjecture?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CDCgov/SARS-CoV-2_Sequencing/issues/9#issuecomment-606234650, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO76WNPA2W4GCBJ6GVDB4YDRKD7AZANCNFSM4LW44E7Q .

dmaccannell commented 4 years ago

Happy to help. Which pipeline? Attachments were missing.

ps120195 commented 4 years ago

image1 image2

ps120195 commented 4 years ago

I ran it thrice, still I am not getting details of vcf which is there in the image 1 ,saying the fasta sequence does not match the REF allele ... and so on

dmaccannell commented 4 years ago

If these are two different systems, you're sure that the perl environment and all dependencies are the same version?

ps120195 commented 4 years ago

Does that make any difference like this?

On Tue 31 Mar, 2020, 2:31 AM Duncan MacCannell, notifications@github.com wrote:

If these are two different systems, you're sure that the perl environment and all dependencies are the same version?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CDCgov/SARS-CoV-2_Sequencing/issues/9#issuecomment-606247568, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO76WNMY3FLJ2AUYYRWRNL3RKECDVANCNFSM4LW44E7Q .

ps120195 commented 4 years ago

All dependencies and perl environment is same for sure

On Tue 31 Mar, 2020, 2:34 AM priya singh, priya120195@gmail.com wrote:

Does that make any difference like this?

On Tue 31 Mar, 2020, 2:31 AM Duncan MacCannell, notifications@github.com wrote:

If these are two different systems, you're sure that the perl environment and all dependencies are the same version?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CDCgov/SARS-CoV-2_Sequencing/issues/9#issuecomment-606247568, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO76WNMY3FLJ2AUYYRWRNL3RKECDVANCNFSM4LW44E7Q .

ps120195 commented 4 years ago

Also the dependencies were installed by pip ,so versions of dependencies are same in both systems. Please suggest why this is happening .what is the actual output we expect from this vcf_mask_lowcoverage.pl in terminal.

On Tue 31 Mar, 2020, 2:38 AM priya singh, priya120195@gmail.com wrote:

All dependencies and perl environment is same for sure

On Tue 31 Mar, 2020, 2:34 AM priya singh, priya120195@gmail.com wrote:

Does that make any difference like this?

On Tue 31 Mar, 2020, 2:31 AM Duncan MacCannell, notifications@github.com wrote:

If these are two different systems, you're sure that the perl environment and all dependencies are the same version?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CDCgov/SARS-CoV-2_Sequencing/issues/9#issuecomment-606247568, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO76WNMY3FLJ2AUYYRWRNL3RKECDVANCNFSM4LW44E7Q .

donutbrew commented 4 years ago

I'm not clear on the difference between screen shots 1 and 2.

In screenshot 1, it looks like it finished correctly. Did you get a reasonable consensus in 'consensus.fasta'?

In screenshot 2, something went wrong. Are you using the same reference fasta that was used for read mapping? Bcftools is very picky about the vcf and the reference to which it applies variants. It may be possible that the reference was getting masked incorrectly, but I can't work out why that would be. I wonder if you could check the samtools depth at position 8782 and potentially let me have a look at your vcf? Interestingly, position 8782 is one where we have observed a lot of variation.

ps120195 commented 4 years ago

Yes I am using the same reference that i used for mapping.

On Tue 31 Mar, 2020, 8:41 PM Clint, notifications@github.com wrote:

I'm not clear on the difference between screen shots 1 and 2.

In screenshot 1, it looks like it finished correctly. Did you get a reasonable consensus in 'consensus.fasta'?

In screenshot 2, something went wrong. Are you using the same reference fasta that was used for read mapping? Bcftools is very picky about the vcf and the reference to which it applies variants. It may be possible that the reference was getting masked incorrectly, but I can't work out why that would be. I wonder if you could check the samtools depth at position 8782 and potentially let me have a look at your vcf? Interestingly, position 8782 is one where we have observed a lot of variation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CDCgov/SARS-CoV-2_Sequencing/issues/9#issuecomment-606688385, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO76WNOFEKK6GI5QGSX5JRLRKIBZVANCNFSM4LW44E7Q .

ps120195 commented 4 years ago

consensus_and_reference consensus2.fasta is my consensus fasta and MN908947.3.fasta is my reference file which i used in mapping too.Also I am getting that C to T variant at same 8782 location vcf_location

ps120195 commented 4 years ago

samtools depth at position 8782 is 1871

ps120195 commented 4 years ago

Here I ran from start till last,still result is same, please see the screenshot full_pipeline

donutbrew commented 4 years ago

Hmm. I'd like to get to the bottom this, but I need a little more info. Can you show me the output of the following:

bcftools view VIC07_ONT.vcf |grep -EC3 "\s8282\s" bcftools view VIC07_ONT.vcf.masked.vcf.gz |grep -EC3 "\s8282\s"

ps120195 commented 4 years ago

bcftools view

ps120195 commented 4 years ago

bcftools_view_EC It was -EC3 ,,sorry

donutbrew commented 4 years ago

Those look OK to me. The only other thing I can think of is that there is something funky going on with the reference. Can you try running dos2unix MN908947.3.fasta and then running the script again? If that is the issue, I can make a change to fix this (I will add it in in any case).

ps120195 commented 4 years ago

Yaa sure ,

ps120195 commented 4 years ago

I tried dos2Unix command and ran the full script again, Still no change in output. Cap1 Cap2

ps120195 commented 4 years ago

I tried now using MN908947.fna instead of MN908947.fasta ,and it worked. See the output cap3

donutbrew commented 4 years ago

Ok, so it looks like you converted the line endings for "MN908947.fasta" and it worked. Using "MN908947.fna" (which is identical except line endings were not converted to unix line endings) trows the error. I think these are all consistent, unless I misunderstand you. I will make the change to take into consideration fasta files with Windows line endings.

ps120195 commented 4 years ago

Thank you for helping me out. I learnt alot during this error hunt.As the error is resolved ,I want to know if I have to use only file.fna for this pipeline ?

donutbrew commented 4 years ago

No worries! Glad you caught this, as it's an easy fix but annoying for users. The filename doesn't matter. As long as the fasta header is the same and (for now) the windows line endings of your reference file are converted to unix line endings.

donutbrew commented 4 years ago

@dmaccannell I think this can be closed