beibeiJ / SimPhe

Bug report git repository for SimPhe
1 stars 0 forks source link

Generating genotype data #1

Open lsmainzer opened 6 years ago

lsmainzer commented 6 years ago

Hi! Nifty package, very useful!

I notice the genotype information is supplied in very particular form.

Is there a way to autogenerate genotype files (like your simupairs.txt) for an arbitrary number of SNPs and individuals?

Thanks, Liudmila Mainzer

lsmainzer commented 6 years ago

also I am getting this error right out of the box while following the vignette instructions:

phe <- sim.phe(sim.pars = fpar.path, fgeno = fgeno.path, ftype = "snp.head", seed = 123, fwrite = FALSE) Error in matrix(unlist(sapply(regmatches(x, gregexpr(pattern, x)), function(e) regmatches(e, : 'data' must be of a vector type, was 'NULL'

beibeiJ commented 6 years ago

Hi! Nifty package, very useful!

I notice the genotype information is supplied in very particular form.

Is there a way to autogenerate genotype files (like your simupairs.txt) for an arbitrary number of SNPs and individuals?

Thanks, Liudmila Mainzer

Hi Ismainzer, do you mean the simulation parameters file? simupairs.txt contains the parameter information while 10SNP.txt includes the genotyping information.

For the error, I didn't get it when I tested. If I failed to run it on other machines, I will let you know.

lsmainzer commented 6 years ago

Yes indeed, my colleagues are able to run your package on MacOS without any problems. I have a fresh install of R and RStudio on Windows, and I am getting this error.

Here is literally my work

library(SimPhe) fpar.path<-system.file("extdata","simupars.txt",package="SimPhe") read.simu.pars(fpar.path) Error in matrix(unlist(sapply(regmatches(x, gregexpr(pattern, x)), function(e) regmatches(e, : 'data' must be of a vector type, was 'NULL'

The problem is with simupars.txt, as I can load the genotype file fine. The simupars.txt file does not have ^M characters, I am using what you have provided. However, it fails even when I convert the file with the Windows characters ^M by inserting them.

Do you think I am lacking some required package?

Thanks for any help.

beibeiJ commented 6 years ago

Yes indeed, my colleagues are able to run your package on MacOS without any problems. I have a fresh install of R and RStudio on Windows, and I am getting this error.

Here is literally my work

library(SimPhe) fpar.path<-system.file("extdata","simupars.txt",package="SimPhe") read.simu.pars(fpar.path) Error in matrix(unlist(sapply(regmatches(x, gregexpr(pattern, x)), function(e) regmatches(e, : 'data' must be of a vector type, was 'NULL'

The problem is with simupars.txt, as I can load the genotype file fine. The simupars.txt file does not have ^M characters, I am using what you have provided. However, it fails even when I convert the file with the Windows characters ^M by inserting them.

Do you think I am lacking some required package?

Thanks for any help.

In gernal, the dependency is not the problem because SimPhe doesn't rely on any package. I am curious what the ^M means? Did you install the package via CRAN or the source codes from githup?

Currently, I can't see the problem. I can try to get a windows machine this week then hope I can find the problem.

lsmainzer commented 6 years ago

Hello! I installed from within RStudio by running the command install.packages("SimPhe")

When I got the error, I thought maybe there is an input file incompatibility between *NIX and Windows due to end-of-line characters - that's what ^M means.

I could try installing on a linux system. Meanwhile, if you could test on Windows, that would be appreciated.

Thanks!

lsmainzer commented 6 years ago

just checked: on Linux this works smoothly

which supports the hypothesis that there is a problem with input file formatting for Windows

unfortunately many biologists use Windows with RStudio, so we would really appreciate a fix :-)

lsmainzer commented 6 years ago

Okay, so now that I can run the package, I'd like to get back to my original question.

Your example file simupairs.txt contains the parameter information for epistasis, and the format is very easy to follow.

What we need, is the ability to generate datasets of arbitrary size (#individuals, #SNPs) with arbitrary number of epistatic pairs, to enable scalability testing of GWAS software. Say I want a dataset with 10,000 SNPs and 2,000 individuals, and in it there be 15 epistatic SNP pairs. It would be desirable not to enter each pair into simupairs.txt by hand.

Do you by any chance have a utility that can generate simupairs.txt for any number of epistatic SNPs pairs? That was my original question.

Sorry it was not clear right away. I have a cold right now, and my brain is a little gelatinous at the moment. Sorry.

beibeiJ commented 6 years ago

It is great to hear that you finally make the package work. What was the solution for the windows issue you mentioned?

There is one way to do it within R. Instead of putting simulation parameters information in a Text file, you could also take a look at the structure of genepars(just a list). Try to generate the simulation information for 15 epistatic SNP pairs with the same structure of genpars then run sim.phe as follow: phe <- sim.phe(sim.pars = genepars, fgeno = fgeno.path, ftype = "snp.head", fwrite = FALSE).

Hope I have answered your question. If not, please let me know.

On Tue, Oct 16, 2018 at 9:26 PM lsmainzer notifications@github.com wrote:

Okay, so now that I can run the package, I'd like to get back to my original question.

Your example file simupairs.txt contains the parameter information for epistasis, and the format is very easy to follow.

What we need, is the ability to generate datasets of arbitrary size (#individuals, #SNPs) with arbitrary number of epistatic pairs, to enable scalability testing of GWAS software. Say I want a dataset with 10,000 SNPs and 2,000 individuals, and in it there be 15 epistatic SNP pairs. It would be desirable not to enter each pair into simupairs.txt by hand.

Do you by any chance have a utility that can generate simupairs.txt for any number of epistatic SNPs pairs? That was my original question.

Sorry it was not clear right away. I have a cold right now, and my brain is a little gelatinous at the moment. Sorry.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/beibeiJ/SimPhe/issues/1#issuecomment-430367611, or mute the thread https://github.com/notifications/unsubscribe-auth/ATLeZojfqHU4umZycQXzI67u5agO3jdxks5uljL8gaJpZM4XdZet .