Open George3d6 opened 4 years ago
Note, I tried the same thing with a .vcf
file and I get the exact same issue with the same amount of logs.
Trying this command instead: ./akt kin --force -M 1 input.bcf > kinship.txt
, I now get this error message:
No frequency VCF provided (-F). Allele frequencies will be estimated from the data.
Problem opening input.bcf
Input file not found.
Which is even weirded, since the input.bcf
file is most certainly present. using absolute paths doesn't seem to help.
Apologies for the confusing error message. AKT requires indexed files so if you bcftools index input.bcf
these problems should go away.
best,
Jared
Hmh,
It might be that I converted to bcf poorly, since I got different error after doing that.
However, I tried converting my original file (56001801065146A.snp.vcf
) into and appropriate format via:
bgzip 56001801065146A.snp.vcf
bcftools index 56001801065146A.snp.vcf.gz
However upon running: ./akt pca -W data/wgs.grch37.vcf.gz 56001801065146A.snp.vcf.gz
I now got the error:
Input: 56001801065146A.snp.vcf.gz
Using file data/wgs.grch37.vcf.gz for PCA weights
1 samples
Using 20 PCs from input file.
0/17491 of sites were in 56001801065146A.snp.vcf.gz
ERROR: less that 90% of sites in data/wgs.grch37.vcf.gz were NOT in data/wgs.grch37.vcf.gz
(Same issue if I use --assume-homref
)
Is this to be expected if my vcf
file only contains full genome sequence data and not mitochondrial DNA data ?
It does contains 150 or so SNPs that are Y-chromosome haplogroup related, so I assumed this would be correct.
Or might there be something wrong with he way I did my indexing ?
wgs.grch37.vcf.gz
contains 17,491 common autosomal variants that should be detected in any high coverage whole genome sequenced human (excluding homozygous reference). It won't matter if MT/X/Y variants are you in your VCF, they will just be ignored.
What reference genome are you using? You need this to be consistent with the version in -W vcf
, there are loading VCFs included for hg19 and hg39 (both with and without the chr
prefix).
I am using a VCF I got from datnte's lab ~10 months ago. Is there a standard way to check the "versioning" on those ? I'm not to familiar with the file format to be honest, every time I think I understand how it works something pops up and I realize I don't.
I seem to have gotten matches on some (130 sties) with data/wgs.hg38.vcf.gz
and on 9960/17491 with data/wgs.hg19.vcf.gz
,
Do you have any further documentation that explains the difference between the files and why matches might be found only on some of those ?
Anyway, thanks for all the help, hopefully I can handle the rest from here :)
I converted a vcf to bcf and tried running your tool with the following command:
./akt pca -W data/wgs.grch37.vcf.gz input.bcf
The only logs I get are:Is see no debug option to make this message more verbose and figure out what the issue is, does a flag for verbose output exist ?
I don't believe the problem is permission related, here's the
stats
output forinput.bcf
: