churchill-lab / g2gtools

Personal diploid genome creation and coordinate conversion
http://churchill-lab.github.io/g2gtools
21 stars 9 forks source link

vcf2chain output error #18

Closed najoshi closed 5 years ago

najoshi commented 5 years ago

So I am trying to use g2gtools to incorporate indels into a genome to make a new genome. I ran vcf2chain to produce the chain file and then ran transform. I got this error in the transform step:

ValueError: invalid coordinates: start (195319303) > stop (160039680)

So then I took a look at the chain file produced by vcf2chain and here are the "chain" lines:

chain 1000 1 160039680 + 0 160039680 1 159984814 + 0 159984814 1 chain 1000 2 160039680 + 0 160039680 2 159984814 + 0 159984814 2 chain 1000 3 160039680 + 0 160039680 3 159984814 + 0 159984814 3 chain 1000 4 149736546 + 0 149736546 4 149684096 + 0 149684096 4 chain 1000 5 149736546 + 0 149736546 5 149684096 + 0 149684096 5 chain 1000 6 149736546 + 0 149736546 6 149684096 + 0 149684096 6 chain 1000 7 124595110 + 0 124595110 7 124548805 + 0 124548805 7 chain 1000 8 124595110 + 0 124595110 8 124548805 + 0 124548805 8 chain 1000 9 124595110 + 0 124595110 9 124548805 + 0 124548805 9 chain 1000 10 120129022 + 0 120129022 10 120109906 + 0 120109906 10 chain 1000 11 120129022 + 0 120129022 11 120109906 + 0 120109906 11 chain 1000 12 120129022 + 0 120129022 12 120109906 + 0 120109906 12 chain 1000 13 104043685 + 0 104043685 13 104012273 + 0 104012273 13 chain 1000 14 104043685 + 0 104043685 14 104012273 + 0 104012273 14 chain 1000 15 104043685 + 0 104043685 15 104012273 + 0 104012273 15 chain 1000 16 90702639 + 0 90702639 16 90680147 + 0 90680147 16 chain 1000 17 90702639 + 0 90702639 17 90680147 + 0 90680147 17 chain 1000 18 90702639 + 0 90702639 18 90680147 + 0 90680147 18 chain 1000 19 91744698 + 0 91744698 19 91720405 + 0 91720405 19 chain 1000 X 91744698 + 0 91744698 X 91720405 + 0 91720405 X chain 1000 Y 91744698 + 0 91744698 Y 91720405 + 0 91720405 Y

Notice that the lengths of the chromosomes are being duplicated. So, for example, the first three lines the length is 160039680... this is the length of chromosome 3, not 1 or 2. The length of chromosome 1 is 195471971 and the length of chromosome 2 is 182113224. And then the duplication continues in groups of three for some reason. I'm guessing this is why I'm getting an error when I try to do the transform. So am I doing something wrong, or is this a bug?

najoshi commented 5 years ago

So it looks like I was using the g2gtools from anaconda2... but I realized that there was one in anaconda3. I tried that and it is the newest one, and it does work. Still seems to be a bug in the anaconda2 version.... I would suggest to make it clear on the installation page that one should use anaconda3 when installing.