churchill-lab / g2gtools

Personal diploid genome creation and coordinate conversion
http://churchill-lab.github.io/g2gtools
21 stars 9 forks source link

Type Error on vcf2chain.py #5

Closed everestial closed 6 years ago

everestial commented 7 years ago

I am getting TypeError when running vcf2chain (--diploid type).

Error Type: <type 'exceptions.TypeError'>
Error Value: list indices must be integers, not str
      /home/everestial007/anaconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools/vcf2chain.py:460
Traceback (most recent call last):
  File "/home/everestial007/anaconda3/envs/g2gtools/bin/g2gtools", line 4, in <module>
    __import__('pkg_resources').run_script('g2gtools==0.1.31', 'g2gtools')
  File "/home/everestial007/anaconda3/envs/g2gtools/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 744, in run_script
  File "/home/everestial007/anaconda3/envs/g2gtools/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 1499, in run_script
  File "/home/everestial007/anaconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.1.31-py2.7.egg-info/scripts/g2gtools", line 117, in <module>
    G2GToolsApp()
  File "/home/everestial007/anaconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.1.31-py2.7.egg-info/scripts/g2gtools", line 75, in __init__
    getattr(self, args.command)()
  File "/home/everestial007/anaconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.1.31-py2.7.egg-info/scripts/g2gtools", line 90, in vcf2chain
    g2gtools.g2g_commands.command_vcf2chain(sys.argv[2:], self.script_name + ' vcf2chain')
  File "/home/everestial007/anaconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools/g2g_commands.py", line 375, in command_vcf2chain
    vcf2chain(args.input, args.fasta, args.strain, args.output, args.keep, args.passed, args.quality, args.diploid)
  File "/home/everestial007/anaconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools/vcf2chain.py", line 497, in vcf2chain
    raise G2GError("Execution halted")
g2gtools.exceptions.G2GError: Execution halted

The error is coming from vcf2chain.py:

write(CHAIN_STRING.format(CHAIN_STRING,
                            from_chr=int(c['chrom']), from_length=lr.chromosome_length, ... # this line

I tried to see the problem by adding a line:

print('value of c : ', c, type(c)) # which gave me output for one line as
('value of c : ', {'chrom': 'scaffold_13', 'stats': OrderedDict([('ACCEPTED', 0)]), 'chain_info': {'right': <g2gtools.vcf2chain.VCFtoChainInfo object at 0x7fa04d253b50>, 'left': <g2gtools.vcf2chain.VCFtoChainInfo object at 0x7fa0477e5dd0>}}, <type 'dict'>)

Second issue: When I run in with no --diploid, I am able to get the chain file but the chain file is created for more than one chromosome, i.e I am supply vcf only with chr2 but its creating the chain for several other chr which are not in the vcf.

kbchoi-jax commented 7 years ago

Please post a few entries from 'scaffold_13' in your vcf file.

everestial commented 7 years ago

Hi @kbchoi-jax This issue is fixed:

So, in the below code:

        # loop through the results and dump to file
    for c in results:
        for ci, lr in c['chain_info'].iteritems():
            outfile = open(lr.output_file, 'a')
            write = outfile.write

            if lr.number_vcf_lines > 0:
                write(CHAIN_STRING.format(CHAIN_STRING,
                            from_chr=c['chrom'], from_length=lr.chromosome_length,
                            from_start=0, from_end=lr.chromosome_length,
                            to_chr=c['chrom'], to_length=lr.end_length,
                            to_start=0, to_end=lr.sums[0] + lr.last_fragment_size + lr.sums[2], id=c['chrom']))
                write("\n")

                for c in lr.chain_entries:
                    write("\t".join(map(str, c)))
                    write("\n")
                write(str(lr.last_fragment_size))
                write("\n\n")
                outfile.close()

The line: for c in lr.chain_entries: was returning the variable c as list values. So, after the first line was parsed the second line couldn't be written because write(.... from_chr = c['chrom'] ... couldn't find c as a dictionary, hence TypeError. I bascially changed the code to: for k in lr.chain_entries: and it worked, but not sure if that's a good fix. Took me whole day to find this out. But, this problem only showed up while running with --diploid option.

I have another issue: When running vcf2chain with --diploid there are chains created for unwanted chromosome. I have vcf with only data for chr2 but chains are created for other chr as well? any tips on this.

kbchoi-jax commented 7 years ago

What does entries look like for the other chromosomes in your chain file?

everestial commented 7 years ago

What do you exactly mean by entries? Can you refer me to an example/problem, just to be clear on the issue?

kbchoi-jax commented 7 years ago

I just want to see the chain file you have generated.

everestial commented 7 years ago

I used the following command to create chain file: g2gtools vcf2chain -f lyrata_genome.fa -i F1_2ms04h.imputed.InDel.haplotype.vcf.gz -s 2ms04h -o REF_to_2ms04h.chain --diploid

Imp Note: This problem is only showing up when I have lots of chormosome (scaffolds).

Attached are the vcf and chain files (with noDiploid run, as well as Diploid run - left and right):

REF_to_2ms04h.chain.txt REF_to_2ms04h.left.chain.txt REF_to_2ms04h.right.chain.txt F1_2ms04h.imputed.InDel.haplotype.vcf.gz

kbchoi-jax commented 7 years ago

Thanks! Can you get me lyrata_genome.fa file too?

everestial commented 7 years ago

Here is the shared lyrata_genome in gzipped format. https://www.dropbox.com/sh/9n3niwubgkrtlm4/AABNuVzdhZjQr3nnIqLx4lzja?dl=0

Thanks,

everestial commented 7 years ago

@kbchoi-jax : any updates on this issue.

kbchoi-jax commented 7 years ago

I was able to replicate the problem. We will work on it soon.

everestial commented 7 years ago

That's good to know. If you made any changes to g2gtools, let me know of the patch. So, I don't have to reinstall the whole thing.

everestial commented 7 years ago

Hi, @kbchoi-jax : Any fix on this issue.

Thanks,

everestial commented 7 years ago

Any fix on the issue yet, @kbchoi-jax, @kbchoi

everestial commented 7 years ago

@kbchoi-jax : I haven't received any update on this error fix.

inti commented 6 years ago

I am seeing a similar error ... any updates? thanks in advance

inti commented 6 years ago

I was using the wrong version of the tool :), sorted now