alyssafrazee / polyester

Bioconductor package "polyester", devel version. RNA-seq read simulator.
http://biorxiv.org/content/early/2014/12/12/006015
90 stars 51 forks source link

Trouble changing size parameter #37

Open rramaker opened 8 years ago

rramaker commented 8 years ago

I'm attempting to run the "simulate_experiment" command function by following the associated help file except changing the size parameter as follows:

fastapath = system.file("extdata", "chr22.fa", package="polyester") numtx = count_transcripts(fastapath) set.seed(4) fold_changes = sample(c(0.5, 1, 2), size=numtx, prob=c(0.05, 0.9, 0.05), replace=TRUE) simulate_experiment(fastapath, reads_per_transcript=10,fold_changes=fold_changes, outdir='simulated_reads', seed=12, size=10)

This results in the following error:

Error in matrix(size, nrow = nrow(basemeans), ncol = ncol(basemeans)) : non-numeric matrix extent

This error happens when providing a vector of sizes for each transcript like so:

simulate_experiment(fastapath, reads_per_transcript=10,fold_changes=fold_changes, outdir='simulated_reads', seed=12, size=(10*fold_changes/3))

Am I formatting the size parameter correctly?

JMF47 commented 8 years ago

Hey there, the documentation for that function describes the following for size

the negative binomial size parameter (see NegBinomial) for the number of reads drawn per transcript. If left blank, defaults to reads_per_transcript / 3. Negative binomial variance is mean + mean^2 / size. Can either be left at default, a vector of the same length as number of transcripts in fasta, if the two groups should have the same size parameters, or a list with 2 elements, each of which is a vector with length equal to the number of transcripts in fasta, which represent the size parameters for each transcript in groups 1 and 2, respectively.

Does that clear up what you might need?

rramaker commented 8 years ago

Thanks for your quick reply. I really appreciate your help.

It's possible I'm mis-reading something in the documentation, but my interpretation is you can provide one of two inputs for the size argument:

1) A numeric vector of size values with the same length as the number of transcripts in the fasta file. I've tried this by modifying the example code provided in the documentation as follows:

fastapath = system.file("extdata", "chr22.fa", package="polyester") numtx = count_transcripts(fastapath) fold_changes = sample(c(0.5, 1, 2), size=numtx, prob=c(0.05, 0.9, 0.05), replace=TRUE) simulate_experiment(fastapath, reads_per_transcript=10,fold_changes=fold_changes, outdir='simulated_reads', seed=12, size=rep(3,numtx))

2) A list containing 2 numeric vectors of size values, each with the same length as the number of transcripts in the fasta file. I've also tried this by modifying the example code provided in the documentation as follows:

fastapath = system.file("extdata", "chr22.fa", package="polyester") numtx = count_transcripts(fastapath) fold_changes = sample(c(0.5, 1, 2), size=numtx, prob=c(0.05, 0.9, 0.05), replace=TRUE) simulate_experiment(fastapath, reads_per_transcript=10,fold_changes=fold_changes, outdir='simulated_reads', seed=12, size=list(rep(3,numtx),rep(6,numtx)))

Both routes result in the same error:

Error in simulate_experiment(fastapath, reads_per_transcript = 10, fold_changes = fold_changes, : size must be a number, numeric vector, or matrix.

It seems providing a numeric vector with length equal to the number of transcripts in the fasta is not working. Please correct me if I'm incorrectly interpreting the documentation. It may be helpful if you could provide some example code where you've modified the size parameter.

Thanks again!

alyssafrazee commented 8 years ago

It does look like you're adjusting the parameter correctly according to the documentation. I'll take a look at this when I get a chance (likely sometime in the next week, unless @JMF47 can get to it sooner).

alyssafrazee commented 8 years ago

Working on fixing this now! One update I have is that, per the documentation, fold_changes needs to be a matrix specifying fold changes between groups. It needs one column per group. The default is two groups (and the num_reps arg is left at the default in your example, so you have 2 groups), but you've provided a fold_changes argument that's a vector, i.e., it maybe has one column but R doesn't think it has any columns at all, so that's a dimension mismatch. The following code does work:

fastapath = system.file("extdata", "chr22.fa", package="polyester")
numtx = count_transcripts(fastapath)
set.seed(4)
fold_change_values = sample(c(0.5, 1, 2), size=2*numtx, prob=c(0.05, 0.9, 0.05), replace=TRUE)
fold_changes = matrix(fold_change_values, nrow=numtx)

simulate_experiment(fastapath, reads_per_transcript=10,fold_changes=fold_changes, outdir='simulated_reads', seed=12, size=10)

(compare to your code in the original comment; note the class of fold_changes). I took that code right from the example page of simulate_experiment.

That said, there should be better error-handling there (the message should have told you that your fold change argument's dimensions didn't match your group dimensions). Also, you should be able to have just one group and change the size parameter. I'm adding those two features now (the latter is a bug).

edit: I realize you are actually allowed to provide a vector of fold changes with two groups -- I forgot I hadn't removed that functionality! Looking into it.

alyssafrazee commented 8 years ago

OK, I've fixed this (rather uglily) in https://github.com/alyssafrazee/polyester/pull/39. Will test more thoroughly and deploy to BioC shortly. Sorry for the confusion!