RobertsLab / resources

https://robertslab.github.io/resources/
18 stars 10 forks source link

Obtaining Genetic Related Matrix from merged VCF #1681

Closed sr320 closed 11 months ago

sr320 commented 1 year ago

I am trying to get genetic related matrix for 26 individuals. I am fairly confident in a merged, filltered, VCF file created, but not clear on how to take it to a genetic relatedness matrix.

Here are some effort in area... https://d.pr/WyABeO specifically section 2.1.1

and suggestion, advice, pipelines to follow would be appreciated.

sr320 commented 1 year ago

see also https://rpubs.com/sr320/1063532

ksil91 commented 1 year ago

I used ngsRelate in the epi-gen oyster paper, with genotype likelohoods from angsd. It can take a gzipped VCF file with version 2. Based on your VCF file header, it looks like you don't have a PL genotype likelihood value so you will want to use the called genotypes:

./ngsrelate  -h my.VCF.gz -T GT -c 1 -O vcf.relatedness

Relatedness depends on the overall allele frequency for each SNP. ngsRelate will calculate that directly from the VCF, but if you have allele frequency for the SNPs from a larger set of samples, you can provide that with the -f parameter.

ksil91 commented 1 year ago

And R code to get it into a matrix, you might need to set the colnames and rownames of the distrab before saving:

library(spaa)
df = read.table("vcf.relatedness",header = T)
dfrab <- df[,c("ida","idb","rab")]
distrab <- as.matrix(list2dist(dfrab))

write.table(distrab,file="MATRIX_mbd_rab.txt", col.names = F, row.names = F, sep = "\t")
sr320 commented 1 year ago

ngsrelate produced a table with the column titles as follows

a b nSites J9 J8 J7 J6 J5 J4 J3 J2 J1 rab Fa Fb theta inbred_relatedness_1_2 inbred_relatedness_2_1 fraternity identity zygosity 2of3_IDB FDiff loglh nIter bestoptimll coverage 2dsfs R0 R1 KING 2dsfs_loglike 2dsfsf_niter

While I see rab - "ida", "idb" are not present (not sure what any of them are 😄 )

ksil91 commented 1 year ago

I think you want a and b, those are the indices for the individuals in the analysis. They are ordered in the same order as in the vcf. Rab is relatedness metric. https://github.com/ANGSD/NgsRelate#output-format