IMB-Computational-Genomics-Lab / ascend

R package - Analysis of Single Cell Expression, Normalisation and Differential expression (ascend)
21 stars 7 forks source link

Problem with "newEMSet" #19

Open cmt6 opened 6 years ago

cmt6 commented 6 years ago

After the most recent update to the package, I'm able to create an EMSet, but it does not calculate the parameters that it would calculate before the update.

For example, when I create the EMSet using the code below:

EMSet <- newEMSet(assays = list(counts = counts), 
                   colInfo = data.frame(cellInfo), 
                   rowInfo = data.frame(genes), 
                   controls = controls
                   )

After this, I try to plot general QC metrics using plotGeneralQC as follows:

raw.qc.plots <- plotGeneralQC(EMSet)

It fails with the following error:

[1] "Plotting library size plots..."
[1] "Plotting average count plots..."
[1] "Plotting top gene expression..."
[1] "Controls detected. Plotting control-specific plots..."
**Error in `[.data.frame`(metrics_df, , c("cell_barcode", "batch", "qc_nfeaturecounts")) : 
  undefined columns selected**

I'm using OSX High Sierra (10.13.6) and R version 3.5.1.

My code was working with the previous version of ascend and now has some breaking changes.

Any help would be appreciated. Thank you!

asenabouth commented 6 years ago

Hello,

Thank you for raising this issue.

Could you please show me how your "controls" variable? Just print it in the console so I can see how it was formatted.

cmt6 commented 6 years ago

Thank you for the fast reply,

This is my control variable:

$Mt
integer(0)

$Rb
  [1]   130   346   416   479   773   799  1134  1642  1864  2159
 [11]  2324  2531  3154  3477  3706  3916  3989  4389  4496  4630
 [21]  4761  5084  5085  5353  5490  5897  5900  6010  6336  6396
 [31]  6639  6640  7406  7410  7412  7475  7648  7731  8490  8543
 [41]  8576  8748  9018  9020  9262  9967 10212 10245 10454 10629
 [51] 10640 10809 11089 11232 11471 11716 11718 11920 12051 12137
 [61] 12438 12548 12767 12844 13086 13143 13144 13149 13394 13455
 [71] 13562 13606 13957 13961 14314 14431 14992 15921 16739 16751
 [81] 16928 17023 17091 17177 17186 17415 17637 17864 18130 18258
 [91] 18368 18624 18699 19010 19011 19183 19554 19805 20287 20429
[101] 21034 21067 21068 21082 21182 21301 21655 21754 21862 21863
[111] 21865 22059 22080 22475 22714 23153 23546 23643 23775 24085
[121] 24176 24461 24843 24865 24993 25290 25651 25656 25721 25762
[131] 25763 25910 26207 26260 26327
asenabouth commented 6 years ago

Sorry @cmt6 for the delay; please check that the values you are using for your controls match the values you are using to label the rows of the matrix. So if you are using numbers, the rownames of your expression matrix should also be numbers. I suspect the grep is returning indices rather than the gene names.

What you can do to remedy this convert the indices to gene identifiers with something like:

new_controls <- lapply(control_variable, function(x) rownames(object)[x])
names(new_controls) <- names(control_variable)
object <- addControlInfo(object, controls = new_controls)
cmt6 commented 6 years ago

Thank you for the reply @asenabouth. I converted the indices to gene identifiers and the plotGeneralQC function worked. After this, I used the function normaliseBatches(EMSet) to normalize the library sizes between batches:

norm.set = normaliseBatches(EMSet)

And it fails with the following error:

[1] "Calculating size factors..."
  |=========================================================================================================| 100%

[1] "Scaling counts..."
  |=========================================================================================================| 100%

**Error in scaled_counts[rownames(expression_matrix), colnames(expression_matrix)] : 
  subscript out of bounds**

Thank you again for your help!

asenabouth commented 6 years ago

Hi @cmt6 ,

Could you please show me how your cell information is set up? (colInfo). You can just copy and paste the first few rows.

Thanks, Anne

cmt6 commented 6 years ago

Hi @asenabouth,

This is how my cell information looks like:

       cell_barcode batch
1 tggcagttcttgggccat    S1
2 ggattgaacagctgagac    S1
3 atccggggtgcttgtgta    S1

Thank you!

asenabouth commented 5 years ago

Hi @cmt6 ,

Thank you for the information. I will do some testing and troubleshooting to see if it's an issue with string identifiers as batch labels. I will keep you posted if it is indeed an issue.

cmt6 commented 5 years ago

Thank you @asenabouth,

Let me know if you need more information about my EMSet.

asenabouth commented 5 years ago

Hi @cmt6 ,

Thank you for your patience. I've introduced a fix that should solve your issue - please update to the latest version of ascend and try again. Let me know how it goes.

Regards, Anne