Ivis4ml / fssemR

An optimizer of Fused-Sparse Structural Equation Models, which is the state-of-the-art (sota) jointly fused sparse maximum likelihood function for structural equation models proposed by Xin Zhou and Xiaodong Cai
4 stars 2 forks source link

05_DataprocLungCancer.R fail to reproduce SNPs imputation with synbreed #2

Closed FogatoHub closed 3 years ago

FogatoHub commented 3 years ago

Hi, i was trying to reproduce the data of the paper but i get stuck at the imputation step (which is the 3 step of the file "05_DataprocLungCancer.R"). I would appreciate if you could help me fix the issue that i found so that i can test the method and later use it with my data.

The specific part of the code is the following

remove unchanged SNP and all Missing NA

impute missing NA in SNP matrix

SNPvarmat = t(SNPvarmat) SNPmap = SNPmap[colnames(SNPvarmat),c(2,3)] colnames(SNPmap) = c("chr", "pos") SNPmap[,2] = as.numeric(SNPmap[,2])

dim(SNPvarmat) ## [1] 122 930002

PData2 = phenoData(gse2$eset) # SNP SNPPheno = PData2@data[rownames(SNPvarmat), c(10, 11)] SNPPheno[,1] = as.numeric(SNPPheno[,1]) SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2]) colnames(SNPPheno) = c("Gender", "Status") SNPData = create.gpData(pheno = SNPPheno, geno = SNPvarmat, map = SNPmap, map.unit = "bp") <-- PROBLEM HERE SNPImputed = codeGeno(SNPData, impute=TRUE, impute.type="beagle", cores = 4) <-- CRASHES HERE SNPvarmat = t(SNPImputed$geno)

I think the issue is that

SNPPheno[,1] = as.numeric(SNPPheno[,1]) SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2])

are incorrect because SNPPheno have 2 columns that are the gender (which is only female) and the status (which is normal/tumor) are strings and the conversion as.numeric leads to R just filling the columns with NAs while i think synbreed wants the phenotype and not a data.frame with all NAs, as per create.gpData(pheno = SNPPheno,

The data to create SNPPheno are taken from the GEOdatabase file "GSE33356-GPL6801_series_matrix.txt.gz" just as it's already written on the code. The values inside of SNPPheno after SNPPheno = PData2@data[rownames(SNPvarmat), c(10, 11)]

head(SNPPheno) characteristics_ch1 characteristics_ch1.1 GSM824988 gender: female tissue: normal lung tissue GSM824989 gender: female tissue: cancer lung tissue GSM824990 gender: female tissue: normal lung tissue GSM824991 gender: female tissue: cancer lung tissue GSM824992 gender: female tissue: normal lung tissue GSM824993 gender: female tissue: cancer lung tissue

So when i do

SNPPheno[,1] = as.numeric(SNPPheno[,1]) SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2])

I get a warning message

SNPPheno[,1] = as.numeric(SNPPheno[,1]) Warning message: NAs introduced by coercion SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2]) Warning message: NAs introduced by coercion

head(SNPPheno) characteristics_ch1 characteristics_ch1.1 GSM824988 NA NA GSM824989 NA NA GSM824990 NA NA GSM824991 NA NA GSM824992 NA NA GSM824993 NA NA

I hope for a reply, thank you.

Ivis4ml commented 3 years ago

Hi, Yes and thanks for that. I think this caused by my unsafe conversion,so please try convert these column to factor by as.factor and then convert them as.numeric.

Wish that could help

Thanks

On Mar 16, 2021, at 12:35 PM, FogatoHub @.***> wrote:

 CAUTION: This email originated from outside the organization. DO NOT CLICK ON LINKS or OPEN ATTACHMENTS unless you know and trust the sender.

Hi, i was trying to reproduce the data of the paper but i get stuck at the imputation step (which is the 3 step of the file "05_DataprocLungCancer.R"). I would appreciate if you could help me fix the issue that i found so that i can test the method and later use it with my data.

The specific part of the code is the following `

remove unchanged SNP and all Missing NA impute missing NA in SNP matrix

SNPvarmat = t(SNPvarmat) SNPmap = SNPmap[colnames(SNPvarmat),c(2,3)] colnames(SNPmap) = c("chr", "pos") SNPmap[,2] = as.numeric(SNPmap[,2])

dim(SNPvarmat) ## [1] 122 930002

PData2 = phenoData(gse2$eset) # SNP SNPPheno = @.***[rownames(SNPvarmat), c(10, 11)] SNPPheno[,1] = as.numeric(SNPPheno[,1]) SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2]) colnames(SNPPheno) = c("Gender", "Status") SNPData = create.gpData(pheno = SNPPheno, geno = SNPvarmat, map = SNPmap, map.unit = "bp") <-- PROBLEM HERE SNPImputed = codeGeno(SNPData, impute=TRUE, impute.type="beagle", cores = 4) <-- CRASHES HERE SNPvarmat = t(SNPImputed$geno) I think the issue is thatSNPPheno[,1] = as.numeric(SNPPheno[,1]) SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2])`

are incorrect because SNPPheno have 2 columns that are the gender (which is only female) and the status (which is normal/tumor) are strings and the conversion as.numeric leads to R just filling the columns with NAs while i think synbreed wants the phenotype and not a data.frame with all NAs, as per create.gpData(pheno = SNPPheno,

The data to create SNPPheno are taken from the GEOdatabase file "GSE33356-GPL6801_series_matrix.txt.gz" just as it's already written on the code. The values inside of SNPPheno after SNPPheno = @.***[rownames(SNPvarmat), c(10, 11)]

head(SNPPheno) characteristics_ch1 characteristics_ch1.1 GSM824988 gender: female tissue: normal lung tissue GSM824989 gender: female tissue: cancer lung tissue GSM824990 gender: female tissue: normal lung tissue GSM824991 gender: female tissue: cancer lung tissue GSM824992 gender: female tissue: normal lung tissue GSM824993 gender: female tissue: cancer lung tissue

So when i do

SNPPheno[,1] = as.numeric(SNPPheno[,1]) SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2])

I get a warning message

`> SNPPheno[,1] = as.numeric(SNPPheno[,1]) Warning message: NAs introduced by coercion

SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2]) Warning message: NAs introduced by coercion

head(SNPPheno) characteristics_ch1 characteristics_ch1.1 GSM824988 NA NA GSM824989 NA NA GSM824990 NA NA GSM824991 NA NA GSM824992 NA NA GSM824993 NA NA `

I hope for a reply, thank you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FIvis4ml%2FfssemR%2Fissues%2F2&data=04%7C01%7Cxxz220%40miami.edu%7C90e6ba7e7fe64426d9b408d8e8b2a6aa%7C2a144b72f23942d48c0e6f0f17c48e33%7C0%7C0%7C637515201295055296%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UftjGwfxK%2Bnhq3145j6Xn9ITXxdFAoY%2BQ5Zmh84yNcc%3D&reserved=0, or unsubscribehttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFHZVEHZTBD3IGL2IDMXZADTD6XH5ANCNFSM4ZJGEJIA&data=04%7C01%7Cxxz220%40miami.edu%7C90e6ba7e7fe64426d9b408d8e8b2a6aa%7C2a144b72f23942d48c0e6f0f17c48e33%7C0%7C0%7C637515201295065293%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hfzbgGGkU5kDw5MsAmtYf2Y0VbEfFIj74lB5adojkXw%3D&reserved=0.

FogatoHub commented 3 years ago

Thank you but it still doesn't work and crashes during the imputation step SNPImputed = codeGeno(SNPData, impute=TRUE, impute.type="beagle", cores = 4)

i get this error:

SNPvarmat = t(SNPImputed$geno)Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : ignoring SIGPIPE signal Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : ignoring SIGPIPE signal Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : ignoring SIGPIPE signal

I have a i7 with 16gb of ram, i don't know if it's a ram problem.

If it's not too much disturb can i contact you at xxz220@miami.edu for a few questions about this program? I'm not posting it here because they are not real issues, more about clarifications in this regard. Thank you.

Ivis4ml commented 3 years ago

Yes,Please feel free to contact via this email

Thanks

On Mar 16, 2021, at 4:47 PM, FogatoHub @.***> wrote:

 CAUTION: This email originated from outside the organization. DO NOT CLICK ON LINKS or OPEN ATTACHMENTS unless you know and trust the sender.

Thank you but it still doesn't work and crashes during the imputation step SNPImputed = codeGeno(SNPData, impute=TRUE, impute.type="beagle", cores = 4)

i get this error:

SNPvarmat = t(SNPImputed$geno)Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : ignoring SIGPIPE signal Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : ignoring SIGPIPE signal Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : ignoring SIGPIPE signal

I have a i7 with 16gb of ram, i don't know if it's a ram problem.

If it's not too much disturb can i contact you at @.**@.> for a few questions about this program? I'm not posting it here because they are not real issues, more about clarifications in this regard. Thank you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FIvis4ml%2FfssemR%2Fissues%2F2%23issuecomment-800686789&data=04%7C01%7Cxxz220%40miami.edu%7C394226efa47645726b1508d8e8d5d2f7%7C2a144b72f23942d48c0e6f0f17c48e33%7C0%7C0%7C637515352377208210%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zO7yA3GN3VEhEkmMnTjqRi1EEiiPTUyZgnmxu3g%2FdKc%3D&reserved=0, or unsubscribehttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFHZVEGGJ7ALMRVCCZVV2KLTD7UYDANCNFSM4ZJGEJIA&data=04%7C01%7Cxxz220%40miami.edu%7C394226efa47645726b1508d8e8d5d2f7%7C2a144b72f23942d48c0e6f0f17c48e33%7C0%7C0%7C637515352377218210%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5G4XyPcmMLU%2BQh5ADCIHui6sZTKHa12tnZXHIyua45A%3D&reserved=0.