Open oyhel opened 5 years ago
I have the same problem. I set parsed_samples for a subset of samples like you did but reader.calls still return calls for all the samples.
I got around it by finding the index of my samples.
sample_index = vcf_reader.header.samples.name_to_idx[subset_samples]
for row in reader:
row.calls[sample_index]
Hi, I think that you are looking for Record.call_for_sample.
for record in reader:
for sample in sel:
call = reader.call_for_sample[sample]
Description
I need to subset samples from a large VCF with approximately 100.000 samples. As the extraction is the only part of the workflow outside python it would be very nice to be able to use vcfpy for this. I read the docs and also tried to deduce how the package works from the source. However, I am coming up short and would very much appreciate some help understanding how the subsetting of samples works.
What I Did
reader.parsed_samples
returns the samples chosen to be parsed in the input: ['sample123', 'sample124']When iterating over the records in the
reader
object:for record in reader: record
I get:
<vcfpy.record.UnparsedCall object at 0x7f065fd30f98>
So I am assuming that the parser respects my list of samples to be parsed. However I am struggling with getting the calls for only these samples.
My goal is to extract all markers, but only for a few samples.
Any help would be very much appreciated!