Atkinson-Lab / Tractor

Scripts for implementing the Tractor pipeline
MIT License
44 stars 5 forks source link

IndexError: list index out of range #20

Closed ChrisK1988 closed 1 year ago

ChrisK1988 commented 1 year ago

Hello,

I am having an issue with the ExtractTracts.py portion of Tractor. I have a fairly large primate dataset of 887 individuals, 9 of which are reference animals (equally split between two closely related species). I have used Tractor on this same dataset about 4 or 5 times now without issue. I have been trying different combinations of reference panels following the protocol of RFMix and then Tractor and it has worked up until now. Using samples extracted from the same master VCF as before, run through RFMix using the same code, I am now getting this error using ExtractTracts.py and am not sure what to do.

The following is the code I used to generate the error:

module load python

for i in {1..20}; do
python ExtractTracts.py \
--msp $SCRATCH/rfmix/Apr2_2023_UnrelatedandFounders.QueryPanel.Chr${i} \
--vcf-prefix $SCRATCH/Beagle_Software/Beagle5.4/Apr2_2023_UnrelatedandFounders.QueryPanel.Chr${i} \
--zipped \
--output-path Apr2_2023_UnrelatedandFounders.QueryPanel.Chr${i} \
--num-ancs 2; done 2>Tractor_error5.log

The error being thrown is below, and happens as well if I try and run a chromosome independently as well:

INFO (main 42): Creating output files for 2 ancestries INFO (main 48): Opening input and output files for reading and writing Traceback (most recent call last): File "ExtractTracts.py", line 184, in extract_tracts(**vars(args)) File "ExtractTracts.py", line 126, in extract_tracts geno_b = str(geno[1]) IndexError: list index out of range

Thank you for your help!

mike-w-wilson commented 1 year ago

Hi @ChrisK1988 ,

It looks like this step is failing when trying to split the genotypes. Can you confirm this VCF is phased and the genotype separator is a "|"?

ChrisK1988 commented 1 year ago

Hi @mike-w-wilson

Thank you for the quick reply! The VCF is indeed phased. Here is a quick output of some samples from the first position:

1 48 1:48[Panubis1.0]C,G C G . PASS DR2=1;AF=0.2073;AC=359;AN=1738 GT:DS 1|0:. 0|0:. 0|1:. 0|0:. 0|1:. 0|1:. 0|0:. 0|0:. 0|0:. 0|0:. 0|0:. 0|0:. 0|1:. 1|0:. 0|1:. 1|0:. 0|0:. 0|0:. 1|0:. 0|0:. 0|0:. 0|1:. 1|0:. 0|0:. 0|1:. 0|0:. 0|1:. 0|0:. 0|0:0 0|1:1 0|0:0 0|0:0 0|0:0 0|0:0 0|0:0 0|0:0 0|1:1 1|0:1 0|0:0 0|0:0 0|0:0 0|1:1 0|0:0 0|0:0 0|0:0 0|0:0

Thank you.

mike-w-wilson commented 1 year ago

Thanks @ChrisK1988 -- hm, could you also confirm your msp file from RFMix covers the same intervals/contigs as your VCF? We've also seen this error message when the VCF contains a locus outside of the msp file intervals, e.g. VCF was the whole genome while the msp was a single contig.

ChrisK1988 commented 1 year ago

Hi @mike-w-wilson --- this is definitely a strange one. Like I said in my initial comment, using the same master VCF and the same samples (just the ref and query panel would shift a little bit depending on what I wanted to call a reference animal or not) have been used for the past month with no issues. I might try merging a VCF that I know worked and then pulling the animals from that one again and seeing if that fixed it. Maybe there's something buried in the VCF that is not immediately noticeable on the surface, like BCFTools glitched out on something. At any rate, here is what you asked for, and you can see they do indeed match.

I am not sure where else to look.

Thanks for your time, if you have any ideas I would be happy to try them.

awk '{ print $1 }' Apr2_2023_UnrelatedandFounders.QueryPanel.Chr1.msp.tsv > test.txt

uniq test.txt

#Subpopulation
#chm
1

grep -v "^#" Chr1Test.vcf | awk '{printf "%s\t%d\t%d\t%s|%s|%s|%s|%s|%s|%s|%s\n", $1,$2-1,$2,$3,$4,$5,$6,$7,$8,$9,$10}' > Chr1Test.bed

awk '{ print $1 }' Chr1Test.bed > test.txt

uniq test.txt

1
ChrisK1988 commented 1 year ago

Just a follow-up to anyone who may read this in future, the input VCF file was the issue. I am not sure exactly what was wrong with it, as RFMix and BCFTools did not throw any errors during the creation of the necessary files. However, I re-created the VCF using a file I knew worked last week and used the new reference panel, and Tractor now works normally.

Thank you for all the help @mike-w-wilson

mike-w-wilson commented 1 year ago

Glad you got it working! We can look into doing some validation on the inputs so the errors arent so cryptic moving forward.