Closed biona001 closed 11 months ago
In extract_tracts
function I tried adding the line
output_files = {}
but it led to another error
NameError: name 'geno_parts' is not defined
Then I modified the variable geno_parts
to just geno
, and the script seemed to work.
I got the exact same error, and after doing those modifications I'm getting the next error:
$ python3 Tractor/scripts/extract_tracts.py --vcf subset1/query_file_phased.vcf --msp subset1/query_results.msp --output-dir output/ --num-ancs 8
INFO (__main__ 91): # VCF File : subset1/query_file_phased.vcf
INFO (__main__ 92): # Prefix of output file names : query_file_phased
INFO (__main__ 93): # VCF File is compressed? : False
INFO (__main__ 94): # Number of Ancestries in VCF : 8
INFO (__main__ 95): # Output Directory : output/
INFO (__main__ 101): Creating output files for 8 ancestries
INFO (__main__ 116): Iterating through VCF file
Traceback (most recent call last):
File "/path/Tractor/scripts/extract_tracts.py", line 240, in <module>
extract_tracts(**vars(args))
File "/patj/Tractor/scripts/extract_tracts.py", line 170, in extract_tracts
window = (ancs_entry[0], int(ancs_entry[1]), int(ancs_entry[2]))
ValueError: invalid literal for int() with base 10: '0.0'
Any idea?
I was getting the same errors so I switched to using the older version of tractor scripts.i.e. V0.0.1 from GitHub which came with the following files
-rwxrwxrwx 1 2.3K UnkinkMSPfile.py -rwxrwxrwx 1 2.5K UnkinkGenofile.py -rwxrwxrwx 1 19K TractorIcon.png -rwxrwxrwx 1 5.2K Tractor-Example-GWAS.py -rwxrwxrwx 1 29K Tractor-Example-GWAS-FromHailformat.ipynb -rwxrwxrwx 1 8.1K RunTractor.py -rwxrwxrwx 1 1.6K README.md -rwxrwxrwx 1 884 hg19_telomeres.bed -rwxrwxrwx 1 8.7K ExtractTracts.py -rwxrwxrwx 1 5.2K ExtractTracts-legacy-2way.py drwxrwxrwx 8 4.0K .git drwxrwxrwx 2 4.0K Example
In
extract_tracts
function I tried adding the lineoutput_files = {}
but it led to another error
NameError: name 'geno_parts' is not defined
Then I modified the variable
geno_parts
to justgeno
, and the script seemed to work.
I tried your modifications to the script in the latest version (extract_tracts.py) and as you mentioned the older errors vanished. However, I now get another error, at the same geno[:2] statement. I'm not familiar with python, so don't know how to debug the code.
python Tractor/scripts/extract_tracts.py --vcf mergedCNICs.chr22.vcf.gz --msp mergedCNICs.chr22.deconvoluted.msp.tsv --num-ancs 3 --output-dir pwd
/ --output-vcf --compress-output
INFO (main 90): # VCF File : mergedCNICs.chr22.vcf.gz
INFO (main 91): # Prefix of output file names : mergedCNICs.chr22
INFO (main 92): # VCF File is compressed? : True
INFO (main 93): # Number of Ancestries in VCF : 3
INFO (main 94): # Output Directory : /hpc/users/zafers02/HIV_SamreenPaola/analysis/tractor/localancestry_vcf/
INFO (main 100): Creating output files for 3 ancestries
INFO (main 116): Iterating through VCF file
INFO (main 175): VCF position, 16055070 is not in an msp window, skipping site
INFO (main 175): VCF position, 16055122 is not in an msp window, skipping site
INFO (main 175): VCF position, 16055942 is not in an msp window, skipping site
INFO (main 175): VCF position, 16056936 is not in an msp window, skipping site
INFO (main 175): VCF position, 16057417 is not in an msp window, skipping site
INFO (main 175): VCF position, 16231396 is not in an msp window, skipping site
INFO (main 175): VCF position, 16285185 is not in an msp window, skipping site
INFO (main 175): VCF position, 16287784 is not in an msp window, skipping site
INFO (main 175): VCF position, 16341421 is not in an msp window, skipping site
INFO (main 175): VCF position, 16347845 is not in an msp window, skipping site
INFO (main 175): VCF position, 16364176 is not in an msp window, skipping site
INFO (main 175): VCF position, 16364981 is not in an msp window, skipping site
INFO (main 175): VCF position, 16393312 is not in an msp window, skipping site
INFO (main 175): VCF position, 16419883 is not in an msp window, skipping site
INFO (main 175): VCF position, 16428267 is not in an msp window, skipping site
INFO (main 175): VCF position, 16433369 is not in an msp window, skipping site
INFO (main 175): VCF position, 16494800 is not in an msp window, skipping site
INFO (main 175): VCF position, 16495458 is not in an msp window, skipping site
INFO (main 175): VCF position, 16497038 is not in an msp window, skipping site
INFO (main 175): VCF position, 16498839 is not in an msp window, skipping site
INFO (main 175): VCF position, 16504399 is not in an msp window, skipping site
INFO (main 175): VCF position, 16517801 is not in an msp window, skipping site
INFO (main 175): VCF position, 16548272 is not in an msp window, skipping site
INFO (main 175): VCF position, 16568293 is not in an msp window, skipping site
INFO (main 175): VCF position, 16572329 is not in an msp window, skipping site
INFO (main 175): VCF position, 16610224 is not in an msp window, skipping site
INFO (main 175): VCF position, 16638894 is not in an msp window, skipping site
INFO (main 175): VCF position, 16649679 is not in an msp window, skipping site
INFO (main 175): VCF position, 16650301 is not in an msp window, skipping site
INFO (main 175): VCF position, 16850308 is not in an msp window, skipping site
INFO (main 175): VCF position, 16855160 is not in an msp window, skipping site
Traceback (most recent call last):
File "Tractor/scripts/extract_tracts.py", line 240, in
I figured this error out.
File "Tractor/scripts/extract_tracts.py", line 181, in extract_tracts geno_a,geno_b = map(str, geno[:2]) ValueError: not enough values to unpack (expected 2, got 1)
There were variants in the phased which had no-calls, and this was causing the trouble. I originally had 3 cohorts, on which I did phasing separately, and then merged (bcftools merge) the phased vcf files to proceed with LAI and Tractor. While merging variants some samples which did not have variants in other 2 cohorts were given ./. genotypes. Extract_tractor could not handle such genotypes.
I then did a bcftools +missing2ref to the merged_phased vcf files, and now the subsequent tractor steps work properly. bcftools merge --threads $threads -o $output.vcf.gz -O z cohort1.chr$i.phase.vcf.gz cohort2.chr$i.phase.vcf.gz cohort2.chr$i.phase.vcf.gz | bcftools +missing2ref -O z -o $output.missing2ref.vcf.gz -p -m
Hi @biona001, Thank you for the bug report and the recommended fix. We have implemented the necessary changes in the code, and should work once you pull the code to the most recent version.
@EfraMP A quick search tells me this might have to do with data types. We are happy to look into it, but please open another issue for this. If you can details or provide an example of files you are inputting so we can replicate on our end, that would be helpful.
@samreenzafer You are correct, extract_tracts.py does expect that there are no missing values in the VCF files.
Hi,
I was trying to run the
extract_tracks.py
script, but it threw an errorAny tips/suggestions would be highly appreciated.