alimanfoo / pysamstats

A fast Python and command-line utility for extracting simple statistics against genome positions based on sequence alignments from a SAM or BAM file.
192 stars 43 forks source link

loading variation: dtype restrictions #72

Closed bramverhelst closed 6 years ago

bramverhelst commented 7 years ago

Hi,

I'm working on a non-standard genome that contains long chromosome IDs. After processing the variation stats from pysamstats, I encountered a bug in my code. I finally traced the bug back to the pystamstats configuration restricting the chromosome ID to the first 12 characters (dtype a12).

-> pysamstats/config.py

dtype_variation = [
    ('chrom', 'a12'),

so my questions:

  1. Is there a specific reason for restricting it to 12?
  2. What would be the best option to circumvent this restriction. Adjusting the config file and recompiling? Or is there a function that dynamically allows me to adjust the config settings from within the code?
alimanfoo commented 7 years ago

Sorry, no specific reason to restrict to 12, it's just a heuristic based on my own limited experience of mosquitoes, parasites and people.

Easiest way to work around is probably just to overwrite the value of the pysamstats.config variable. There should be no need to recompile, you can just do this in your code. I.e.:

pysamstats.config.dtype_variation = [('chrom', 'a20', ...)]

On Tue, Nov 21, 2017 at 9:38 AM, bramverhelst notifications@github.com wrote:

Hi,

I'm working on a non-standard genome that contains long chromosome IDs. After processing the variation stats from pysamstats, I encountered a bug in my code. I finally traced the bug back to the pystamstats configuration restricting the chromosome ID to the first 12 characters (dtype a12).

-> pysamstats/config.py

dtype_variation = [ ('chrom', 'a12'),

so my questions:

  1. Is there a specific reason for restricting it to 12?
  2. What would be the best option to circumvent this restriction. Adjusting the config file and recompiling? Or is there a function that dynamically allows me to adjust the config settings from within the code?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/pysamstats/issues/72, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QoVdlf2pVXLBO5jL_FMapUErqc4_ks5s4poZgaJpZM4QliY5 .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

bramverhelst commented 7 years ago

thanks for the advice!

alimanfoo commented 6 years ago

Resolved in a more general way via #74