FRED-2 / Fred2

Python-based framework for computational immunomics
http://fred-2.github.io/
40 stars 29 forks source link

Potential issue for frameshift tagging in File.IO.FileReader.read_vcf #223

Open MartinPersida opened 5 years ago

MartinPersida commented 5 years ago

By screening though the code of the read_vcf() function of Fred2.IO.FIleReader I noticed the following:

l194: v_list = record.ALT with record an item of vl being a Record object from pyvcf package This, if I am not mistaken, will return a list of alternate allele of pyvcf specific type (in my case vcf.model._Substitution).

l218 elif record.is_indel: l219 if len(v_list)%3 == 0:

In the previous condition, determining if the variant will lead to a frameshift, the modulo is done on the list and not on the allele itself, so it will always return 1 if there is only one variant identified at the given loci and not the actual length of the allele itself isn't it?

mwalzer commented 5 years ago

:+1: seems you're right, the tests also do not cover multi-allele records.

b-schubert commented 5 years ago

@mwalzer Mathias, are you actively working on this issue?

mwalzer commented 5 years ago

@b-schubert Haven't had much time dedicated to it. There is a sloppy fix which is ignoring all but the first ALT,

v_list = next(iter(record.ALT or []), None)

Probab. fine for many cases, but if there are multiple ALT of different len., this should produce independent variant objects. Haven't had time to integrate a test. I can add a collaborative branch on https://github.com/FRED-2/Fred2/ directly, so we can work together if you'd like or you take over.