Biometris / statgenGWAS

See https://biometris.github.io/statgenGWAS for a full description
https://biometris.github.io/statgenGWAS/
12 stars 3 forks source link

codeMarkers drops pheno data #14

Closed wrengs closed 3 months ago

wrengs commented 3 months ago

Dear Bart-Jan and Willem,

Recently, I installed statgenGWAS install.packages("statgenGWAS") after which I tried testing the installation following your example on https://biometris.github.io/statgenGWAS/

The removal of duplicate SNPs from gDataDrops using codeMarkers also seems to remove the phenotyping data as this list of 10 dataframes goes from 246 observations per dataframe to 0 observations per dataframe.

Bevore removal of duplicate snps: str(gDataDrops)

List of 5 $ map :'data.frame': 41722 obs. of 2 variables: ..$ chr: int [1:41722] 1 1 1 1 1 1 1 1 1 1 ... ..$ pos: int [1:41722] 3498 157104 238347 239225 255850 263938 325012 379844 395380 485953 ... $ markers: int [1:246, 1:41722] 0 0 2 2 2 2 0 1 2 2 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:246] "8" "12" "22" "35" ... .. ..$ : chr [1:41722] "SYN83" "PZE-101000060" "PZE-101000088" "PZE-101000083" ... $ pheno :List of 10 ..$ Cam12R:'data.frame': 246 obs. of 9 variables: .. ..$ genotype : chr [1:246] "11430" "A3" "A310" "A347" ... .. ..$ grain.yield : num [1:246] 1.31 1.5 2.76 1.81 2.86 ... .. ..$ grain.number : num [1:246] 439 498 958 688 1124 ... .. ..$ seed.size : num [1:246] NA NA NA NA NA NA NA NA NA NA ... .. ..$ anthesis : num [1:246] 70.7 70.2 70.8 71.2 72.8 ... .. ..$ silking : num [1:246] 80 83.5 82.5 80.5 82.4 ... .. ..$ plant.height : num [1:246] 119 140 136 126 134 ... .. ..$ tassel.height: num [1:246] 153 166 168 158 173 ... .. ..$ ear.height : num [1:246] 65.3 66.5 74.3 69.3 68.2 ... ..$ Cra12R:'data.frame': 246 obs. of 9 variables: .. ..$ genotype : chr [1:246] "11430" "A3" "A310" "A347" ... .. ..$ grain.yield : num [1:246] 0.638 0.821 2.824 1.899 2.878 ... .. ..$ grain.number : num [1:246] 198 357 1088 773 1268 ... .. ..$ seed.size : num [1:246] 263 243 280 263 246 ... .. ..$ anthesis : num [1:246] 70.1 71.2 69.5 73.3 71.9 ... .. ..$ silking : num [1:246] 72 80.4 73.7 74.4 77.6 ... .. ..$ plant.height : num [1:246] 113 109 115 112 119 ... .. ..$ tassel.height: num [1:246] 153 150 149 153 163 ... .. ..$ ear.height : num [1:246] 50 52.4 53.1 61.4 54.4 ...

Run removal of duplicates gDataDropsDedup <- codeMarkers(gDataDrops, impute = FALSE, verbose = TRUE)

Input contains 41722 SNPs for 246 genotypes. 0 genotypes removed because proportion of missing values larger than or equal to 1. 0 SNPs removed because proportion of missing values larger than or equal to 1. 5098 duplicate SNPs removed. Output contains 36624 SNPs for 246 genotypes.

After removal of duplicate SNPs str(gDataDropsDedup)

List of 5 $ map :'data.frame': 36624 obs. of 2 variables: ..$ chr: int [1:36624] 1 1 1 1 1 1 1 1 1 1 ... ..$ pos: int [1:36624] 3498 157104 238347 239225 255850 263938 325012 379844 395380 485953 ... $ markers: int [1:246, 1:36624] 0 0 2 2 2 2 0 1 2 2 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:246] "8" "12" "22" "35" ... .. ..$ : chr [1:36624] "SYN83" "PZE-101000060" "PZE-101000088" "PZE-101000083" ... $ pheno :List of 10 ..$ Cam12R:'data.frame': 0 obs. of 9 variables: .. ..$ genotype : chr(0) .. ..$ grain.yield : num(0) .. ..$ grain.number : num(0) .. ..$ seed.size : num(0) .. ..$ anthesis : num(0) .. ..$ silking : num(0) .. ..$ plant.height : num(0) .. ..$ tassel.height: num(0) .. ..$ ear.height : num(0) ..$ Cra12R:'data.frame': 0 obs. of 9 variables: .. ..$ genotype : chr(0) .. ..$ grain.yield : num(0) .. ..$ grain.number : num(0) .. ..$ seed.size : num(0) .. ..$ anthesis : num(0) .. ..$ silking : num(0) .. ..$ plant.height : num(0) .. ..$ tassel.height: num(0) .. ..$ ear.height : num(0)

I am running R version 4.3.1 (2023-06-16) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.3 LTS statgenGWAS_1.0.9

Hope you could help determine where there might be an issue.

Kind regards, Willem

BartJanvanRossum commented 3 months ago

Dear Willem,

I tried to replicate your issue, but when I run the code from the example/vignette everything is working fine. I tested with both R 4.3.1 and R 4.4.1 and the results look the same. I have to admit though that I ran everything on a Windows pc, but for now I don't see how that should make a difference.

I put the relevant part of the code below. Can you confirm that if you run this from a clean R environment, e.g. after a restart of R, you get the problem you describe. If so, I will try to find a way to test this on a linux machine myself, but that might take a bit longer.

Best regards, Bart-Jan

library(statgenGWAS)
## Read data.
data("dropsMarkers")
data("dropsMap")
data("dropsPheno")

## Add genotypes as row names of dropsMarkers and drop Ind column.
rownames(dropsMarkers) <- dropsMarkers[["Ind"]]
dropsMarkers <- dropsMarkers[colnames(dropsMarkers) != "Ind"]

## Add genotypes as row names of dropsMap.
rownames(dropsMap) <- dropsMap[["SNP.names"]]
## Rename Chomosome and Position columns.
colnames(dropsMap)[match(c("Chromosome", "Position"), colnames(dropsMap))] <- c("chr", "pos")

## Rename Variety_ID in phenotypic data to genotype.
colnames(dropsPheno)[colnames(dropsPheno) == "Variety_ID"] <- "genotype"
## Select relevant columns and convert data to a list.
dropsPhenoList <- split(x = dropsPheno[c("genotype", "grain.yield",
                                         "grain.number", "seed.size",
                                         "anthesis", "silking", "plant.height",
                                         "tassel.height", "ear.height")], 
                        f = dropsPheno[["Experiment"]])

## Create a gData object all data.
gDataDrops <- createGData(geno = dropsMarkers, map = dropsMap, pheno = dropsPhenoList)

## Remove duplicate SNPs from gDataDrops.
gDataDropsDedup <- codeMarkers(gDataDrops, impute = FALSE, verbose = TRUE) 
wrengs commented 3 months ago

Dear Bart-Jan,

Thank you for the swift reply. Running the above code in a fresh environment did the trick. I was under the impression that I had opened a fresh environment, but apparently that was not the case upon appearance of the error.

In a clean environment, str(gDataDropsDedup) now reports:

List of 5 $ map :'data.frame': 36624 obs. of 2 variables: ..$ chr: int [1:36624] 1 1 1 1 1 1 1 1 1 1 ... ..$ pos: int [1:36624] 3498 157104 238347 239225 255850 263938 325012 379844 395380 485953 ... $ markers: int [1:246, 1:36624] 0 0 2 2 2 2 0 1 2 2 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:246] "11430" "A3" "A310" "A347" ... .. ..$ : chr [1:36624] "SYN83" "PZE-101000060" "PZE-101000088" "PZE-101000083" ... $ pheno :List of 10 ..$ Cam12R:'data.frame': 246 obs. of 9 variables: .. ..$ genotype : chr [1:246] "11430" "A3" "A310" "A347" ... .. ..$ grain.yield : num [1:246] 1.31 1.5 2.76 1.81 2.86 ... .. ..$ grain.number : num [1:246] 439 498 958 688 1124 ... .. ..$ seed.size : num [1:246] NA NA NA NA NA NA NA NA NA NA ... .. ..$ anthesis : num [1:246] 70.7 70.2 70.8 71.2 72.8 ... .. ..$ silking : num [1:246] 80 83.5 82.5 80.5 82.4 ... .. ..$ plant.height : num [1:246] 119 140 136 126 134 ... .. ..$ tassel.height: num [1:246] 153 166 168 158 173 ... .. ..$ ear.height : num [1:246] 65.3 66.5 74.3 69.3 68.2 ... ..$ Cra12R:'data.frame': 246 obs. of 9 variables: .. ..$ genotype : chr [1:246] "11430" "A3" "A310" "A347" ... .. ..$ grain.yield : num [1:246] 0.638 0.821 2.824 1.899 2.878 ... .. ..$ grain.number : num [1:246] 198 357 1088 773 1268 ... .. ..$ seed.size : num [1:246] 263 243 280 263 246 ... .. ..$ anthesis : num [1:246] 70.1 71.2 69.5 73.3 71.9 ... .. ..$ silking : num [1:246] 72 80.4 73.7 74.4 77.6 ... .. ..$ plant.height : num [1:246] 113 109 115 112 119 ... .. ..$ tassel.height: num [1:246] 153 150 149 153 163 ... .. ..$ ear.height : num [1:246] 50 52.4 53.1 61.4 54.4 ...

Sorry of the oversight on my end and thanks again for the help! I will close the issue with this comment.

All the best, Willem