YosefLab / ImpulseDE2

37 stars 10 forks source link

Error fitting Impulse model #29

Open mallorymaynes opened 2 years ago

mallorymaynes commented 2 years ago

Hi David,

I was able to incorporate vecConfounders as you suggested in #28, but have come across a couple of errors along the way. The first error that was generated:

Selected 35938 genes/regions for analysis.
# Run DESeq2: Using dispersion factorscomputed by DESeq2.
<simpleError in DESeqDataSet(se, design = design, ignoreRank): all variables in design formula must be columns in colData>
[1] "WARNING: DESeq2 failed on full model - dispersions may be inaccurate.Estimating dispersions on reduced model. Supply externally generated dispersion parameters via vecDispersionsExternal if there is a more accurate model for your data set."
Error in DESeqDataSet(se, design = design, ignoreRank) : 
  all variables in design formula must be columns in colData

I am not sure why, but if change nothing about the matrices (matCountData and dfAnnotation) but make vecConfounders = NULL, the model will run. DESeq2 will also run using the same matrices/data so I'm not sure where the issue is. Regardless, I generated size and dispersion factors via DESeq2:

dds <- DESeqDataSetFromMatrix(countData = impulse.counts,
                              colData = tcAnnotation,
                              design = ~ W_1 + Time + Condition + Time:Condition ) 
dds$Condition <- relevel(dds$Condition, "control" )
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
disp <- dispersions(dds)
sf <- sizeFactors(dds)

I plugged those into the model, but am now I'm running into more errors:

Selected 35938 genes/regions for analysis.
# Using externally supplied dispersion factors.
# Compute size factors
# Fitting null and alternative model to the genes
[1] "ERROR: Fitting impulse model: fitImpulseModel(). Wrote report into ImpulseDE2_lsErrorCausingGene.RData"
[1] "vecParamGuess 1 0 0 0 1.5 2.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"
[1] "vecCounts 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"
[1] "scaDisp 0.959083640203582"
[1] "vecSizeFactors 0.265232750945446 0.937887077848201 0.471787142367316 2.3692324827861 1.07691748609144 2.74251352052381 1.47491742559609 1.19935708295077 0.720955973179691 0.638193516833878 1.73127550065188 0.479369408828627 1.09405624404785 1.4177398000797 2.03800272071811 0.464261892768889 0.48452108381903 0.613679441482979 0.93433976057791 0.921275647443607 1.05195674328528 1.02700558427171 0.876663334043048 2.23277452188713 0.914062398537946 0.648278784317811 1.09647787543551 0.385497828741209 0.889377980399356 0.737615623520998 0.380483473325757 2.12459737405986 1.59375813119804 1.41691518278861 0.579076510331146 0.300109669721402 0.988642077502141 3.06504185140705 3.81510443349392 1.28710715245912 1.11072408817913 0.350919490185542 0.590119090696843 0.454315665540979 1.37436617302019 1.71112100918971 1.18855466933541 1.40283597689504 0.698644771455615 1.22331919468558 1.36595000388005 0.837384696048172 1.10827177721278 0.524166174729103 0.495594163657917 3.96153574307441 0.567104925258094 0.645254478497964 1.15807892439661"
[1] "vecTimepointsUnique 1 2 3 4 5 6 7 8 9 10 11 12 13 14 17 21"
[1] "vecidxTimepoint 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16"
[1] "lsvecidxBatch 1:59"
[1] "MAXIT 1000"
<simpleError in optim(par = vecParamGuess, fn = evalLogLikImpulse_comp, vecCounts = vecCounts,     scaDisp = scaDisp, vecSizeFactors = vecSizeFactors, vecTimepointsUnique = vecTimepointsUnique,     vecidxTimepoint = vecidxTimepoint, lsvecidxBatch = lsvecidxBatch,     vecboolObserved = !is.na(vecCounts), method = "BFGS", control = list(maxit = MAXIT,         reltol = RELTOL, fnscale = -1)): non-finite finite-difference value [3]>
Error: BiocParallel errors
  element index: 303, 304, 305, 306, 307, 308, ...
  first error: non-finite finite-difference value [3]
In addition: Warning messages:
1: In if (class(matCountData) == "SummarizedExperiment") { :
  the condition has length > 1 and only the first element will be used
2: stop worker failed:
  attempt to select less than one element in OneIndex

I will also note that "ImpulseDE2_lsErrorCausingGene.RData" does not exist anywhere in my remote directory (running the model on our institution's supercomputer). I am happy to supply any additional information/data to help troubleshoot. I really think ImpulseDE2 should fit our experiment/model the best but I am not sure where to begin diagnostics on these errors.

davidsebfischer commented 2 years ago

The latter one looks like a numerical error during fitting, let's check if you have all-zero genes in your data set that could cause this? On first sight, looking at the error report, that is the most suspicious field with respect to numeric stability.

mallorymaynes commented 2 years ago

I ran this filtering step on my counts table before running RUVseq (to generate factors of unwanted variation), which only keeps genes that have at least 10 counts in at least 2 samples, so I don't think that should be the issue.

filter <- apply(forward, 1, function(x) length(x[x>10])>=2) #filter: greater than 10 counts in at least 2 samples
mallorymaynes commented 2 years ago

Hey David, any other thoughts? Is there anything I can give you to help diagnose?

ctlongg commented 6 months ago

@mallorymaynes Thanks for bringing this up and did you managed to find a solution. Unfortunately, I ran into the same problem when trying to incorporate RUVg covariates into vecConfounders. Here is my code,

ImpulseDE2_pathways <- runImpulseDE2( matCountData = raw_counts, dfAnnotation = model_meta, boolCaseCtrl = FALSE, scaNProc = 60, vecConfounders = "Batch", scaQThres = 0.01, boolIdentifyTransients = TRUE)

Got the error:

<simpleError in optim(par = vecParamGuess, fn = evalLogLikImpulse_comp, vecCounts = vecCounts, scaDisp = scaDisp, vecSizeFactors = vecSizeFactors, vecTimepointsUnique = vecTimepointsUnique, vecidxTimepoint = vecidxTimepoint, lsvecidxBatch = lsvecidxBatch, vecboolObserved = !is.na(vecCounts), method = "BFGS", control = list(maxit = MAXIT, reltol = RELTOL, fnscale = -1)): non-finite finite-difference value [3]>.

Running without the batch works fine. But this means the batch may impede with the analysis.