Closed NicMAlexandre closed 9 months ago
Here is a typical line:
Scaffold1.1 34783 . A G . PASS NS=386;AF=0.372944 GT:GL:PL:GP 0/1:-31.6,0,-90.2:316,0,902:1.68258e-32,1,0
The easiest way is to also output the beagle file on angsd
.
As for filtering, low freq SNPs are not very informative, so you can remove them from the LD analysis (both 1%
and 5%
are ok); but keep in mind that those SNPs will not be included in the output file! Meaning that, if you are planning to do LD pruning afterwards, those sites won't show up in the list of unlinked sites.
If you are interested in those sites, you can either filter them on the pruning step (but ngsLD
will be slower) or use the list of excluded/linked sites and remove them from your dataset.
I see, I can exclude variants less than 5% MAF and then include them in my list of unlinked sites afterwards. That is good to know about the output!
Can you expand on this I'm having trouble understanding "If you are interested in those sites, you can either filter them on the pruning step (but ngsLD will be slower) or use the list of excluded/linked sites and remove them from your dataset." Are you saying either I can 1) include the lower MAFs and run ngsLD, which will be slower from the larger number of sites, or 2) I can just exclude those sites and just add them to my list of unlinked sites after getting a list of pruned variants.
As for the format, the problem is that I don't have access to bams because they take up too much space on our server, and ANGSD doesn't convert VCF files to GLF format, so I have to work directly with VCFs. I can convert the VCF to beagle format, although I'm not sure if ANGSD beagle output is slightly different from other beagle formats (I will confirm).
prune_graph -i example.ld --header --weight-field "r2" --weight-filter "dist < 150000 && r2 > 0.2 && maf1 > 0.05 && maf2 > 0.05" --out out.keep
), use list of unlinked sites.angsd
to read the GL from a VCF and create a beagle file (http://popgen.dk/angsd/index.php/Input#BCF.2FVCF_files), but I have not tested it. And remember to check if your VCF uses the GP
, PL
or the GL
tag (and use the corresponding option as in https://github.com/ANGSD/angsd/blob/8a58e179d23a3230adf2fa4115e16ecf057a9695/multiReader.cpp#L62-L64).Thank you! 1 and 2 are all sorted out!
I am still trying to work on getting the correct Geno file, but ANGSD ran successfully. Here is an example beagle file for two individuals and two positions:
marker allele1 allele2 Ind0 Ind0 Ind0 Ind1 Ind1 Ind1 CM060204.1_15474 0 1 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 CM060204.1_16136 0 2 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000
If Ind0
has tow heterozygotes and Ind1
has two homozygotes major, then it looks fine. 😄
Yes they do, excellent! Since we are limiting comparisons to about 50Kb or 100Kb, I am going to split each job by scaffold which should speed up the process.
It is working thanks a bunch! I will close this!!
Hello,
I am using the vcf output from ANGSD which contains the PL and PP fields.
Do you have any tips for formatting this file as input for ngsLD?
I have 11 million SNPS, 7 Million above 1% MAF, and 6 million > 5% MAF. What is your recommendation for filtering by allele frequency to reduce the run time and memory.