EricArcher / strataG

strataG is a toolkit for haploid sequence and multilocus genetic data summaries, and analyses of population structure.
25 stars 11 forks source link

Pairwise Statistic Inconsistency: new Gtypes object vs. genind2gtypes #19

Closed cr1570 closed 4 years ago

cr1570 commented 6 years ago

Hello,

I am using STRATAG to perform pairwise analyses on populations of a globally distributed shark species. I found a discrepancy in my results depending on how I created the gtypes object. My first method created the gtypes object new (see code below).

I changed my methodology to use the genind2gtypes function later in my analysis to try to streamline my code (see code below).

When I use the second method (genind2gtypes) on the same VCF data with the same population designations, I find no significant differentiation in two comparisons that are significant when I use the first method (gtypes new). I noticed that my first method retains the original genotypes (AGCT) and the second method function converts SNP data to 0s and 1s. I am wondering if perhaps I am losing data in the file conversions?

Can you please offer any insight into this issue?

Thank you, Cassandra


library("strataG") library("pegas") library("vcfR")

Method 1: Create new Gtypes object

1: import data from vcf (using pegas), split alleles

Data1 <- read.vcf("data.vcf") Data1M <- as.matrix(Data1) Data1M.split <- alleleSplit(Data1M, sep = '/')

2: import strata for schemes, make sure row names are consistent in both data frames

Strata <- read_csv("Strata.csv", col_names = TRUE) Data.schemes <- as.data.frame(Strata[, c("Region", "Location")]) rownames(Data.schemes) <- Strata$ID rownames(Data1M.split) <- Strata$ID

3: create gtypes objects with schemes attached

Data1.gtype <- new("gtypes", gen.data = Data1M.split, ploidy = 2, schemes = Data.schemes)

4: stratify gtypes objects

Data1.gtype.loc <- stratify(Data.gtype, "Location")

Method 2: Convert genind object to gtypes

1: import data from vcf (using vcfR) and convert to genind

Data2.vcfr <- read.vcfR("data.vcf", verbose = TRUE) Data2.genind <- vcfR2genind(Data2.vcfr, sep="[|/]")

2: add populations to genind object

popID <- read.table("popID.txt", header=FALSE) pop(Data2.genind) <- as.vector(as.matrix(popID))

3: convert genind to gtypes object

Data2.gtype.loc <- genind2gtypes(Data2.genind)

run pairwise statistics

Fstats1.loc <- popStructTest(Data1.gtype.loc, nrep = 10000, stats = c("fst", "gst.dbl.prime", "d"), quietly = FALSE, max.cores = NULL) Fstats2.loc <- popStructTest(Data2.gtype.loc, nrep = 10000, stats = c("fst", "gst.dbl.prime", "d"), quietly = FALSE, max.cores = NULL)

EricArcher commented 6 years ago

Would you check the result of genind2df(Data2.genind, usepop = TRUE, oneColPerAll = TRUE) to see if that produces a data.frame with 0s and 1s? This is the function that genind2gtypes uses so the conversion may be happening there.

EricArcher commented 6 years ago

As an aside, it might be easier to use df2gtypes rather than new to create your gtypes object. The output shouldn't be different, but it is a bit more convenient function.