ataudt / chromstaR

Combinatorial and differential ChIP-seq analysis in R
8 stars 3 forks source link

Error in combining multivariate HMMS (Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [8] is duplicated) #13

Closed tomszar closed 5 years ago

tomszar commented 5 years ago

Hi,

I'm running Chromstar in differential mode, but I got stuck with an error during the Combining multivariate HMMs step. Below is the error:

===========================
Combining multivariate HMMs
===========================
Processing HMM 1 ... 0.78s
Processing HMM 2 ... 0.77s
Processing HMM 3 ... 1.03s
Processing HMM 4 ... 0.56s
Processing HMM 5 ... 0.66s
Processing HMM 6 ... 0.54s
Concatenating HMMs ... 1.07s
Making combinations ...Error in `levels<-`(`*tmp*`, value = as.character(levels)) :
  factor level [8] is duplicated
Calls: Chromstar ... stateBrewer -> state.brewer -> data.frame -> factor
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

I'm not sure what the problem is, but it seems that happens when calling the combineMultivariates() function. Within that function I ran the section after "Making combinations ...", and it seems that the binary.matrix option of the stateBrewer() is the one responsible for the error (I ran the function without it and it was okay)

I leave a Dropbox link to reproduce the error

Thanks

ataudt commented 5 years ago

Hi, thanks for the reproducible example. The issue here is the large number of replicates for each mark. chromstaR uses a decimal representation of the combinatorial states, and this decimal representation is in this case larger than the maximum number than R can handle. The problem is a %% (modulo) operation in converting the decimal number back to binary representation. I couldn't find a quick fix for this. I suggest the following: 1) You can merge (some or all) replicates to reduce the number of replicates that chromstaR has to handle internally. 2) You leave out low-quality replicates (if existing). This has the advantage that also results will be clearer.

ataudt commented 5 years ago

For convenience, you can merge replicate by using a "|" in the experiment table. Example: file mark condition replicate pairedEndReads controlFiles lv-H3K27me3-SHR-male-bio2-tech1.bam|lv-H3K27me3-SHR-male-bio2-tech2.bam H3K27me3 SHR 1 FALSE lv-input-SHR-male-bio1-tech1.bam lv-H3K4me3-SHR-male-bio2-tech1.bam H3K4me3 SHR 1 FALSE lv-input-SHR-male-bio1-tech1.bam lv-H3K4me3-SHR-male-bio3-tech1.bam H3K4me3 SHR 2 FALSE lv-input-SHR-male-bio1-tech1.bam

tomszar commented 5 years ago

Thanks! That works perfectly