MrOlm / inStrain

Bioinformatics program inStrain
MIT License
149 stars 33 forks source link

SNP calling naive question #161

Closed ntromas closed 2 weeks ago

ntromas commented 1 year ago

Hi!

My question might be naive but I wonder if there is a simple way to include the position that would have ref_freq = 1 to separate these to the one that won't have any coverage. I know it will generate huge output but we would need this "full" information if that's doable with inStrain.

Cheers!

Nico

MrOlm commented 1 year ago

Hi Nico,

I'm a little confused about what you're asking for. If you're looking for the full coverage of every position, you can access that in the raw_data files (e.g. https://instrain.readthedocs.io/en/latest/advanced_use.html#accessing-other-data)

If it's something else you're looking for please let me know

Matt

ntromas commented 1 year ago

Hi Matt,

Sorry if I was confused. Coverage of all positions would work too (covT right?). What I meant initially is that inStrain won't report if there is no variation and basically ref_freq=1. And inStrain won't report if position coverage is null. Just trying to find a way to differentiate both. But maybe it does not make sense...

Nico

Le mar. 17 oct. 2023 à 12:30, Matt Olm @.***> a écrit :

Hi Nico,

I'm a little confused about what you're asking for. If you're looking for the full coverage of every position, you can access that in the raw_data files (e.g. https://instrain.readthedocs.io/en/latest/advanced_use.html#accessing-other-data )

If it's something else you're looking for please let me know

Matt

— Reply to this email directly, view it on GitHub https://github.com/MrOlm/inStrain/issues/161#issuecomment-1766769869, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5D6HP5DOUHRR3CFEMEEDX72XBPAVCNFSM6AAAAAA6EBH6TOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRWG43DSOBWHE . You are receiving this because you authored the thread.Message ID: @.***>

--


Nicolas Tromas PhD McGill University E-mail: @. @.> Researchgate: NTromasPage https://www.researchgate.net/profile/Nicolas_Tromas Web: http://www.shapirolab.ca/


ntromas commented 1 year ago

Hi Matt,

What would be the best way to extract this information? Should I just use coverT.hdf5 (any advice to play with this kind of file?)

Cheers,

Nico

Le mar. 17 oct. 2023 à 12:57, Nicolas Tromas @.***> a écrit :

Hi Matt,

Sorry if I was confused. Coverage of all positions would work too (covT right?). What I meant initially is that inStrain won't report if there is no variation and basically ref_freq=1. And inStrain won't report if position coverage is null. Just trying to find a way to differentiate both. But maybe it does not make sense...

Nico

Le mar. 17 oct. 2023 à 12:30, Matt Olm @.***> a écrit :

Hi Nico,

I'm a little confused about what you're asking for. If you're looking for the full coverage of every position, you can access that in the raw_data files (e.g. https://instrain.readthedocs.io/en/latest/advanced_use.html#accessing-other-data )

If it's something else you're looking for please let me know

Matt

— Reply to this email directly, view it on GitHub https://github.com/MrOlm/inStrain/issues/161#issuecomment-1766769869, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5D6HP5DOUHRR3CFEMEEDX72XBPAVCNFSM6AAAAAA6EBH6TOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRWG43DSOBWHE . You are receiving this because you authored the thread.Message ID: @.***>

--


Nicolas Tromas PhD McGill University E-mail: @. @.> Researchgate: NTromasPage https://www.researchgate.net/profile/Nicolas_Tromas Web: http://www.shapirolab.ca/


--


Nicolas Tromas PhD McGill University E-mail: @. @.> Researchgate: NTromasPage https://www.researchgate.net/profile/Nicolas_Tromas Web: http://www.shapirolab.ca/


MrOlm commented 1 year ago

Hi Nico,

To load the covT file, you should do something like the following in python:

import inStrain
import inStain.SNVprofile
import inStrain.profile.profile_utilities

# Load your inStrain profile
IS = inStain.SNVprofile.SNVprofile(`/home/mattolm/inStrainOutputTest/`)

# Extract raw covT
covT =   IS.get('covT')

# Declare the scaffold you want to analyze and it's length
scaff = 'test_scaffold'
scaff_length = 10000

# Get the coverage information for that scaffold
scaff_cov =  inStrain.profile.profile_utilities.mm_counts_to_counts_shrunk(covT[scaff], fill_zeros=scaff_length)

Once you run something like that, scaff_cov will be a python dictionary where if you input a 0-based position it will give you the coverage (e.g. scaff_cov[10]) will report the coverage at position 10 at that scaffold.

Best, Matt

ntromas commented 1 year ago

Hi Matt,

Great! Thanks for the help!

Cheers,

Nico

Le mer. 1 nov. 2023 12 h 10, Matt Olm @.***> a écrit :

Hi Nico,

To load the covT file, you should do something like the following in python:

import inStrain import inStain.SNVprofile import inStrain.profile.profile_utilities

Load your inStrain profile

IS = inStain.SNVprofile.SNVprofile(/home/mattolm/inStrainOutputTest/)

Extract raw covT

covT = IS.get('covT')

Declare the scaffold you want to analyze and it's length

scaff = 'test_scaffold' scaff_length = 10000

Get the coverage information for that scaffold

scaff_cov = inStrain.profile.profile_utilities.mm_counts_to_counts_shrunk(covT[scaff], fill_zeros=scaff_length)

Once you run something like that, scaff_cov will be a python dictionary where if you input a 0-based position it will give you the coverage (e.g. scaff_cov[10]) will report the coverage at position 10 at that scaffold.

Best, Matt

— Reply to this email directly, view it on GitHub https://github.com/MrOlm/inStrain/issues/161#issuecomment-1789236131, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5D6HJRT6MCHVA7Q3CSQDYCJX6ZAVCNFSM6AAAAAA6EBH6TOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBZGIZTMMJTGE . You are receiving this because you authored the thread.Message ID: @.***>