cggh / scikit-allel

A Python package for exploring and analysing genetic variation data
MIT License
287 stars 49 forks source link

vcf_to_dataframe not able to extract columns that read_vcf method can #298

Open philt31 opened 4 years ago

philt31 commented 4 years ago

Extracting same column with vcf_to_dataframe gives error as shown in screenshot. is this correct behaviour (in which case how to i extract the WIT column) or is it a bug?

Screenshot 2020-01-08 at 09 59 42

hardingnj commented 4 years ago

Thanks for the report. Generally the google group might be a better for a quick response (users tend to respond quicker than devs!) - https://groups.google.com/forum/#!forum/scikit-allel)

I can't see anything obviously wrong in your code. It may be some confusion around column naming. Can you show the output of head(witty_df).

philt31 commented 4 years ago

Screenshot 2020-01-09 at 09 50 46

witty_df.head() shown above

alimanfoo commented 4 years ago

Hi @philt31, apologies, the vcf_to_dataframe does not support extracting calldata fields, only variants. Any calldata fields will be ignored. This should be added to the docstring.

nightscape commented 4 years ago

I just ran into this as well. @alimanfoo has this just not yet been implemented, or is there a specific reason it cannot be done?

katmaumue commented 4 years ago

There seems to be an open PR for this.

alimanfoo commented 4 years ago

Hi all, just to say that PR #252 seems OK but is lacking any tests, I think we'd need some tests before it could be merged. Also it's not completely clear that it's handling all cases properly, particularly there can be both 2D calldata arrays (e.g., GQ) and 3D calldata arrays (e.g., GT). Apologies I can't work on this myself right now but if this functionality is useful to folks here then I'd encourage someone to review PR #252 with the above in mind, and/or take PR #252 and develop it further.