andersgs / beast2_constsites

Update your BEAST2 XML to account for constant sites
GNU General Public License v3.0
6 stars 1 forks source link

Error: Something is not right with your vcf file #4

Open ksw9 opened 4 years ago

ksw9 commented 4 years ago

Hi, Thanks for a great program! I'm getting an error 'Something is not right with your vcf file' and I'm wondering if there are VCF file specifications that need to be met to use this tool? I'm not sure if, for instance, this requires haploid genotype calls? [Uploading cluster10.vcf.gz…]()

./run_b2cs -x ${xml} beast2 $ref  test2.vcf
2020-04-22 15:57 B2CONSTSITES INFO  Imported sequence NC_000962.3.
2020-04-22 15:57 B2CONSTSITES INFO  Sequence has length 4411532

2020-04-22 15:57 B2CONSTSITES CRITICAL Something is not right with your vcf file
Traceback (most recent call last):
  File "/home/kwalter/.local/bin/run_b2cs", line 8, in <module>
    sys.exit(run_b2constsites())
  File "/home/kwalter/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/kwalter/.local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/kwalter/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/kwalter/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/kwalter/.local/lib/python3.8/site-packages/b2constsites/run_b2constsites.py", line 53, in run_b2constsites
    cs.load_vcf()
  File "/home/kwalter/.local/lib/python3.8/site-packages/b2constsites/b2constsites.py", line 49, in load_vcf
    self.var_pos.update(self._parse_pos(rec))
  File "/home/kwalter/.local/lib/python3.8/site-packages/b2constsites/b2constsites.py", line 34, in _parse_pos
    if rec.INFO['TYPE'][0] in ACCEPTABLE_VAR_TYPES:
KeyError: 'TYPE'

Following a previously published pipeline, I used VarScan to call variants and have a VCF with diploid calls. I'm attaching the VCF below.

Any suggestions would be great! Thank you and best,

andersgs commented 4 years ago

Hi @ksw9. I am busy with COVID19 response at the moment. I’ll have a look over the coming days. Sorry for the delay.

ksw9 commented 3 years ago

Hi, I hope you are doing well. I was wondering if you could help me troubleshoot the issue we encountered and if your script requires a specifically formatted VCF?

Thank you in advance! All the best,

GaryNapier commented 3 years ago

Hi, I just had the same error and thought I'd share a solution.

b2constsites.py has this function, which is checking for nucleotide types (I think on the home page it says it can only deal with SNPs and MNP):

    def _parse_pos(self, rec):
        '''
        Find which positions to mask
        '''
        ACCEPTABLE_VAR_TYPES = {'snp', 'mnp'}
        try:
            if rec.INFO['TYPE'][0] in ACCEPTABLE_VAR_TYPES:
                start_pos = rec.POS
                if rec.INFO['TYPE'][0] == 'mnp':
                    end_pos = start_pos + len(rec.ALT[0])
                else:
                    end_pos = start_pos + 1
        except KeyError:
            logging.critical("Something is not right with your vcf file")
            raise
        return list(range(start_pos, end_pos))

This function is looking for the 'TYPE' field in the INFO column, which does not exist in my VCF (I'm using VCF v 4.2, if that helps), so that's why I got the same KeyError: 'TYPE'

Since I know my VCF is only SNPs because of the filtering I've done, I don't need it to check the type, so I just commented most of the function and left the start and end positions:

def _parse_pos(self, rec):
        '''
        Find which positions to mask
        '''
        ACCEPTABLE_VAR_TYPES = {'snp', 'mnp'}
        # try:
            # if rec.INFO['TYPE'][0] in ACCEPTABLE_VAR_TYPES:
                # start_pos = rec.POS
                # if rec.INFO['TYPE'][0] == 'mnp':
                    # end_pos = start_pos + len(rec.ALT[0])
                # else:
                    # end_pos = start_pos + 1
        # except KeyError:
        #     logging.critical("Something is not right with your vcf file")
        #     raise
        start_pos = rec.POS
        end_pos = start_pos + 1
        return list(range(start_pos, end_pos))

I got no errors and the XML opens fine in Beauti.

Gary