Closed silviaadiz closed 3 months ago
Hi @silviaadiz,
There must be more to that error, can you share the complete error you get when running the script? Also, would it be possible to share a snapshot of the VCF file?
Hi! Thanks for the quick reply.
This is the full error:
INFO (main 42): Creating output files for 3 ancestries
INFO (main 48): Opening input and output files for reading and writing
INFO (main 117): VCF position, 13014 is not in an msp window, skipping site
INFO (main 117): VCF position, 13104 is not in an msp window, skipping site
INFO (main 117): VCF position, 13105 is not in an msp window, skipping site
INFO (main 117): VCF position, 13119 is not in an msp window, skipping site
INFO (main 117): VCF position, 13150 is not in an msp window, skipping site
INFO (main 117): VCF position, 13167 is not in an msp window, skipping site
INFO (main 117): VCF position, 13192 is not in an msp window, skipping site
INFO (main 117): VCF position, 13222 is not in an msp window, skipping site
INFO (main 117): VCF position, 13293 is not in an msp window, skipping site
INFO (main 117): VCF position, 13301 is not in an msp window, skipping site
INFO (main 117): VCF position, 13311 is not in an msp window, skipping site
Traceback (most recent call last):
File "/mnt/lustre/scratch/nlsas/home/usc/gb/sdd/lat23/TRACTOR/Tractor/scripts/ExtractTracts.py", line 184, in
FYI- so far I haven't seen the "skipping site" message when using the full VCF. This is how the filtered VCF looks like, if this screenshot is not enough I can share more with you by email:
Hi @silviaadiz,
I was unable to replicate the error, however, we have recently updated the scripts. Can you test again with the updated scripts, if the error persists, please email me with a small snippet of your VCF file at nirav.shah@bcm.edu so that I can replicate the error.
Hi! Sorry for the late reply, I haven't been able to work on this until recently. Thank you for your help. I have run the new scripts but I still got the error, so I'm going to prepare a chunk of my VCF and send it to you. It might be related to how PLINK does the conversion to VCF, so I will also filter them with bcftools and check how that goes.
Thank you again, Silvia
Any updates @silviaadiz?
The issue was resolved via email. The error was not caused by the imputed data, but rather by occasional unphased genotypes present in a file that appeared to contain phased genotypes.
Hi! I am encountering an error for which a few issues have already been raised, but I have been trying to troubleshoot it and still haven't worked it out. The thing is I am using imputed files (from TopMed), but they have been filtered (by MAF and INFO) using PLINK. RFMix handled these vcf without problems, but when running the ExtractTracts.py, I get this message:
File "/mnt/lustre/scratch/nlsas/home/usc/gb/sdd/lat23/TRACTOR/Tractor/scripts/ExtractTracts.py", line 126, in extract_tracts geno_b = str(geno[1])
This is the VCF header:
fileformat=VCFv4.3
fileDate=20231123
source=PLINKv2.00
filedate=2023.3.13
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
pipeline=michigan-imputationserver-1.7.1
imputation=minimac4-1.0.2
phasing=eagle-2.4
panel=apps@topmed-r2@1.0.0
r2Filter=0.3
contig=
FORMAT=
Sample genotypes are are split in columns by "\t", and genotype calls are separated by "|". It works fine when using the raw files from imputation instead (without filtering), but it is taking a lot of time just to run chr22 (and the output files are also very heavy). I have tried to modify the script in line 87 in case the problem was the "\t" separator between samples, but it does still throw the error. I would much appreciate your help here!
Thank you! :)