Closed sabrinacamp2 closed 4 years ago
Hi Sabrina,
It seems like there is an issue with your ANGSD command. Could you post it here?
The only filters in the ANGSD command I give for the quick start are these: -sites $SITES -minMapQ 30 -minQ 20
So perhaps check how the -sites file relates to your data, are they both aligned to the same reference genome? (1000 genomes is hg19).
And what is WES?
Best, Emil
Hi Emil,
I used this ANGSD command
angsd -i sample.bam -GL 2 \ -sites sites -doGlf 2 \ -doMajorMinor 3 -minMapQ 30 -minQ 20 \ -doDepth 1 -doCounts 1 \ -out ${sample_id}
My BAM is also aligned to hg19. By WES, I mean whole-exome sequencing. I was curious if the tool only worked/ provided accurate results for a whole-genome sequenced bam or if whole-exome worked as well.
Hi Emil,
I have one other question. Unlike NGSadmix, fastNGSadmix does not take a K value as an input. Does the K value equal the number of populations you define in the -whichPops
argument?
Hi,
Ok, I still think there is some mismatch between the BAM file and the .sites file.
Can you try and run this command in ANGSD: angsd -i sample.bam -GL 2 -doGlf 2 -doMajorMinor 1 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out out -doMaf 3 -rf chr1:1-1000000 (-rf only analyses variants in this window, so feel free to change it to whenever your first variants are)
And give me the head of the .mafs.gz file?
And also give me the head of the .sites file?
I think it might be that in one file the chrs are coded "chr1" and in the other just "1".
fastNGSadmix should also work for exome data, especially whole-exome data!
################
And fastNGSadmix does not take a K parameter since it is more supervised than that, we tell it to look for certain populations. But yeah I guess you can see the number of populations included, as something similar to K.
Best, Emil
Thank you so much for your help so far. I tried running the command you suggest and I got this error:
[getFilenames] -> Problems opening file
I checked the BAM header and my contigs are 1..22,X. I also checked the sites file and it is the same formatting. I am a little confused on which file would be .mafs.gz. The ending of my sequencing file is .bam and usually .maf is a variant call format.
Thanks for the insight on the K parameter!
Hi,
It seems like you have specified the wrong filename (I just guessed of the name of the .bam file, so please use the name of your .bam file). Or that there is something wrong with your .bam file.
The mafs.gz file will be generated by ANGSD when the aforementioned command runs successfully.
Hi Sabrina, did you resolve your problems? Or is this still an active issue?
Hi Emil,
I decided to just bypass the ANGSD step by getting genotype likelihoods via bcftools -> using plink to turn the vcf into the binary plink files -> using these outputs directly in fastNGSadmix.
I will close the issue since I am no longer trying to troubleshoot the issue.
Thank you for all of your help! Best, Sabrina
Hi!
I was trying to follow the
Quick start - 1000 genomes reference panel
tutorial using a BAM derived from whole exome sequencing. The output of the ANGSD step to get the beagle formatted file gives me 0 variants. This is what it says in the log:Does fastNGSadmix not work with WES? If so, do you happen to recommend any other tool to derive admixture proportions using a WES BAM file?
All the best, Sabrina