Open rramaker opened 8 years ago
Hey there, the documentation for that function describes the following for size
the negative binomial size parameter (see NegBinomial) for the number of reads drawn per transcript. If left blank, defaults to reads_per_transcript / 3. Negative binomial variance is mean + mean^2 / size. Can either be left at default, a vector of the same length as number of transcripts in fasta, if the two groups should have the same size parameters, or a list with 2 elements, each of which is a vector with length equal to the number of transcripts in fasta, which represent the size parameters for each transcript in groups 1 and 2, respectively.
Does that clear up what you might need?
Thanks for your quick reply. I really appreciate your help.
It's possible I'm mis-reading something in the documentation, but my interpretation is you can provide one of two inputs for the size argument:
1) A numeric vector of size values with the same length as the number of transcripts in the fasta file. I've tried this by modifying the example code provided in the documentation as follows:
fastapath = system.file("extdata", "chr22.fa", package="polyester") numtx = count_transcripts(fastapath) fold_changes = sample(c(0.5, 1, 2), size=numtx, prob=c(0.05, 0.9, 0.05), replace=TRUE) simulate_experiment(fastapath, reads_per_transcript=10,fold_changes=fold_changes, outdir='simulated_reads', seed=12, size=rep(3,numtx))
2) A list containing 2 numeric vectors of size values, each with the same length as the number of transcripts in the fasta file. I've also tried this by modifying the example code provided in the documentation as follows:
fastapath = system.file("extdata", "chr22.fa", package="polyester") numtx = count_transcripts(fastapath) fold_changes = sample(c(0.5, 1, 2), size=numtx, prob=c(0.05, 0.9, 0.05), replace=TRUE) simulate_experiment(fastapath, reads_per_transcript=10,fold_changes=fold_changes, outdir='simulated_reads', seed=12, size=list(rep(3,numtx),rep(6,numtx)))
Both routes result in the same error:
Error in simulate_experiment(fastapath, reads_per_transcript = 10, fold_changes = fold_changes, : size must be a number, numeric vector, or matrix.
It seems providing a numeric vector with length equal to the number of transcripts in the fasta is not working. Please correct me if I'm incorrectly interpreting the documentation. It may be helpful if you could provide some example code where you've modified the size parameter.
Thanks again!
It does look like you're adjusting the parameter correctly according to the documentation. I'll take a look at this when I get a chance (likely sometime in the next week, unless @JMF47 can get to it sooner).
Working on fixing this now! One update I have is that, per the documentation, fold_changes
needs to be a matrix specifying fold changes between groups. It needs one column per group. The default is two groups (and the num_reps
arg is left at the default in your example, so you have 2 groups), but you've provided a fold_changes
argument that's a vector, i.e., it maybe has one column but R doesn't think it has any columns at all, so that's a dimension mismatch. The following code does work:
fastapath = system.file("extdata", "chr22.fa", package="polyester")
numtx = count_transcripts(fastapath)
set.seed(4)
fold_change_values = sample(c(0.5, 1, 2), size=2*numtx, prob=c(0.05, 0.9, 0.05), replace=TRUE)
fold_changes = matrix(fold_change_values, nrow=numtx)
simulate_experiment(fastapath, reads_per_transcript=10,fold_changes=fold_changes, outdir='simulated_reads', seed=12, size=10)
(compare to your code in the original comment; note the class of fold_changes
). I took that code right from the example page of simulate_experiment
.
That said, there should be better error-handling there (the message should have told you that your fold change argument's dimensions didn't match your group dimensions). Also, you should be able to have just one group and change the size parameter. I'm adding those two features now (the latter is a bug).
edit: I realize you are actually allowed to provide a vector of fold changes with two groups -- I forgot I hadn't removed that functionality! Looking into it.
OK, I've fixed this (rather uglily) in https://github.com/alyssafrazee/polyester/pull/39. Will test more thoroughly and deploy to BioC shortly. Sorry for the confusion!
I'm attempting to run the "simulate_experiment" command function by following the associated help file except changing the size parameter as follows:
fastapath = system.file("extdata", "chr22.fa", package="polyester") numtx = count_transcripts(fastapath) set.seed(4) fold_changes = sample(c(0.5, 1, 2), size=numtx, prob=c(0.05, 0.9, 0.05), replace=TRUE) simulate_experiment(fastapath, reads_per_transcript=10,fold_changes=fold_changes, outdir='simulated_reads', seed=12, size=10)
This results in the following error:
Error in matrix(size, nrow = nrow(basemeans), ncol = ncol(basemeans)) : non-numeric matrix extent
This error happens when providing a vector of sizes for each transcript like so:
simulate_experiment(fastapath, reads_per_transcript=10,fold_changes=fold_changes, outdir='simulated_reads', seed=12, size=(10*fold_changes/3))
Am I formatting the size parameter correctly?