Closed MichelMoser closed 7 years ago
Dear Michel
Your command "./angsd -yBin STYLEpheno.ybin -doAsso 1 -out asso -doMajorMinor 1 -doMaf 1 -SNP_pval 1e-6 -beagle gwas_min20_b.beagle.gz -P 8 -fai /data/users/mmoser/Peaxi_genome_v1.6.2.scaffolds.fasta.fai -nInd 22"
should give the error " -> Potential problem: Cannot estimate the major and minor based on posterior probabilities"
Are you sure this is the command you used?
ANGSD assumes that the -beagle input is posterior probabilities (such as the probability you get from imputation) and not genotype likelihoods.
-Anders
Hello Anders,
I currently cant reproduce the commands and output but you are right about the warning given the command i posted.
Could you tell me how to run Assocation test withouth the beagle imputation? Because newer versions of beagle only take vcf as input format (to my knowledge). I could write some conversion but to save time it would be easier without.
As -beagle did not work i ran angsd with the bam files as input:
command:
/home/mmoser/angsd/angsd -yBin STYLEpheno.ybin -GL 1 -doAsso 1 -out asso -doMajorMinor 1 -doMaf 1 -bam bam.filelist -P 8 -doPost 1 -fai /data/users/mmoser/Peaxi_genome_v1.6.2.scaffolds.fasta.fai -nInd 22 -SNP_pval 1e-2
output of asso.lrt0.gz
Chromosome Position Major Minor Frequency LRT
Peaxi162Scf00000 5300 T A 0.302125 -0.000000
Peaxi162Scf00000 5365 C T 0.258312 -0.000000
Peaxi162Scf00000 5366 C T 0.063891 -0.000000
Peaxi162Scf00000 5367 T G 0.237802 -0.000000
Peaxi162Scf00000 5990 C T 0.075368 -0.000000
Peaxi162Scf00000 6014 A G 0.232140 -0.000000
Peaxi162Scf00000 6022 C G 0.067783 -0.000000
Peaxi162Scf00000 6036 G T 0.113906 -0.000000
Peaxi162Scf00000 6066 C T 0.286320 -0.000000
All the LRT values were -0.000. Does that mean they did not get under the p-value threshold of 0.001? Or is there something wrong with the command?
Thank you, Michel
phenotype_file:
1
2
2
1
2
2
1
1
2
1
1
2
2
1
1
1
2
2
2
1
1
2
beagle.gz:
marker allele1 allele2 Ind0 Ind0 Ind0 Ind1 Ind1 Ind1 Ind2 Ind2 Ind2 Ind3 Ind3 Ind3 Ind4 Ind4 Ind4 Ind5 Ind5 Ind5 Ind6 Ind6 Ind6 Ind7 Ind7 Ind7 Ind8 Ind8 Ind8 Ind9 Ind9 Ind9 Ind10 Ind10 Ind10 Ind11 Ind11 Ind11 Ind12 Ind12 Ind12 Ind13 Ind13 Ind13 Ind14 Ind14 Ind14 Ind15 Ind15 Ind15 Ind16 Ind16 Ind16 Ind17 Ind17 Ind17 Ind18 Ind18 Ind18 Ind19 Ind19 Ind19 Ind20 Ind20 Ind20 Ind21 Ind21 Ind21
Peaxi162Scf00000_5300 3 0 0.391864 0.357039 0.251098 0.017502 0.459925 0.522574 0.988413 0.007722 0.003866 0.267768 0.287485 0.444747 0.038356 0.337209 0.624435 0.000000 0.111109 0.888891 0.735745 0.183933 0.080322 0.143434 0.390432 0.466134 0.571042 0.285519 0.143439 0.359192 0.281617 0.359192 0.735745 0.183933 0.080322 0.041756 0.525561 0.432683 0.333333 0.333333 0.333333 0.709983 0.177493 0.112525 0.571042 0.285519 0.143439 0.115131 0.471033 0.413836 0.980943 0.015326 0.003731 0.007155 0.425449 0.567395 0.951439 0.029731 0.018830 0.000102 0.644380 0.355518 0.768288 0.192069 0.039643 0.861854 0.107729 0.030418
Peaxi162Scf00000_5365 1 3 0.748007 0.186999 0.064994 0.054067 0.585038 0.360895 0.960903 0.030027 0.009070 0.034268 0.680486 0.285247 0.379640 0.336747 0.283612 0.000000 0.267206 0.732794 0.926158 0.057883 0.015959 0.980473 0.007659 0.011868 0.689947 0.172484 0.137569 0.002100 0.531195 0.466705 0.901829 0.056362 0.041809 0.040363 0.510833 0.448803 0.655688 0.327841 0.016470 0.604219 0.305482 0.090299 0.276655 0.385051 0.338294 0.934467 0.058402 0.007131 0.474689 0.365449 0.159862 0.006588 0.390028 0.603384 0.720803 0.180198 0.098999 0.008419 0.396898 0.594683 0.333333 0.333333 0.333333 0.526842 0.263419 0.209740
Hi Michael
If you want to perform association without imputation then you can either use the test based on allele frequency differences between cases and controls (-doAsso 1) og use the logistic regression model (-doAsso 2).
Examples of how to run then are given on the wiki http://www.popgen.dk/angsd/index.php/Association
Since you only have 22 individuals it will be very hard to find anything. Usually a GWAS has thousands of individuals. A LRT of 0 (as in your output) means a p-value of one. Most likely the site is not polymorphic (you choose a very lenient threshold ) or there is no information in the cases or the control. You have read about the output here: http://www.popgen.dk/angsd/index.php/Association#Output
If you want to do imputation then you can use beagle 3 https://faculty.washington.edu/browning/beagle/beagle.jar https://faculty.washington.edu/browning/beagle/beagle_3.3.2_31Oct11.pdf
-Anders
Hello,
I try to detect association using already computed GL files like beagle.gz or glf.gz . As i have some quite heavy bam-files, i thought i can skip the processing of them (takes about 2 days on 8 cores) and feed the GL files directly to angsd for association calculation.
Unfortunately i must be doing something really wrong as but i dont know exactly what.
I would expect angsd to compute LRT for all the sites in the GL files:
Instead i get output for almost every base in the genome with LRT values of either NAN or -0.00000 (which i would expect for non-informative sites).
command used:
stderr get huge within minutes printing:
the output looks like:
Small follow-up question:
If i try to use beagle 4.1 for imputations, the like= option is not recognized anymore. Do you know if i have to format the angsd-beagle-file to vcf or is there an alternative command to use?
Sorry for such basic questions about how to use the tool but i could not find information about this anywhere.
Thank you, Michel