EDePasquale / DoubletDecon

A tool for removing doublets from single-cell RNA-seq data
69 stars 19 forks source link

Any updates to support Seurat 4? #41

Closed Dazcam closed 3 years ago

Dazcam commented 3 years ago

Hi there,

I'm looking to run Doublet Decon using Seurat 4 but after following the instructions on your wiki I get some errors.

The first couple I could 'fix' as they appeared to relate to the changes the developers of Seurat 4 made to the the fold change parameters of the FindMarkers() function:

We have restructured the code of the FindMarkers() function to be easier to understand, interpret, and debug. The results of differential expression are unchanged. However, by default we now prefilter genes and report fold change using base 2, as is commonly done in other differential expression packages, instead of natural log. If the default option is set, the output of FindMarkers() will include the column avg_log2FC, instead of avg_logFC. Users can restore the previous behavior (natural log), by specifying base = exp(1).

These updates affected two lines (lines 20-21) of the code in the Improved_Seurat_Pre_Process() function.

However, the after running Improved_Seurat_Pre_Process() and Main_Doublet_Decon() I get the following error:

Loading packages...
Reading data...
WARNING: if using ICGS2 file input, please import 'rawDataFile' and 'groupsFile' as path/location instead of an R object.
Processing raw data...
Creating original data heatmap...
Error in plot.new() : figure margins too large
In addition: Warning message:
In if (class(groupsFile) == "character") { :
  the condition has length > 1 and only the first element will be used

I noticed that in Improved_Seurat_Pre_Process() code the line expression=as.data.frame(seuratObject@assays[["RNA"]]@counts) points to the raw counts in my data whereas the normalised counts are actually stored in seuratObject@assays[["RNA"]]@data. As the example file in your wiki shows normalised counts I tried both @counts and @data but got the same error.

Are you considering updating Doublet Decon to incorporate data generated by Seurat 4? If so, I can send more details regarding these issues.

Best Wishes,

Darren

EDePasquale commented 3 years ago

Thank you, Darren. This is actually my task this week (well timed question!) so I would expect a fix quite soon.

Best, Erica

Dazcam commented 3 years ago

Hi Erica,

Thank you for the quick response. Excited to hear you will be incorporating Seurat 4 updates soon. If you can, please drop me a line if/when you manage to post the updated code.

Best,

Darren

EDePasquale commented 3 years ago

Hi Darren,

I have now updated DoubletDecon and it should work with Seurat 4. The changes I made:

As for the error "Error in plot.new() : figure margins too large", I have seen this most frequently with users of R Studio. THe plot that is trying to generate is too big for the size of the plot window you have allotted in the UI of R Studio. To correct this, just drag the plot window a little wider and taller and see if that fixes it. Source: https://stackoverflow.com/questions/12766166/error-in-plot-new-figure-margins-too-large-in-r. If you are not using R Studio or if this doesn't solve the problem, please reach back out in a new issue and I will try to help fix it.

I tested both Seurat 3.2.2 and Seurat 3.9.9 (v4) using 2 different objects on 2 machines, but please let me know if you catch something else while using DoubletDecon. Bugs appear sometimes despite careful testing! Thanks for bringing this up, hope you enjoy!

Best, Erica

Confirmation:

packageVersion("Seurat") [1] ‘3.2.2’

extract marker genes

seuratObject.markers=FindAllMarkers(object = seuratObject, only.pos = TRUE, min.pct=0.25) Calculating cluster 0 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03s
Calculating cluster 1 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02s
Calculating cluster 2 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=04s
Calculating cluster 3 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01s
head(seuratObject.markers) p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene UG0898H09 2.193383e-17 186.18414 0.934 0.642 5.195247e-13 0 UG0898H09 NPFFR2 5.595162e-16 165.79417 0.934 0.563 1.325270e-11 0 NPFFR2 ZFAT 8.933254e-15 35.57417 0.669 0.333 2.115931e-10 0 ZFAT KIF18B 1.604839e-14 Inf 1.000 0.777 3.801221e-10 0 KIF18B PDE4C 2.821303e-14 72.87417 1.000 0.807 6.682538e-10 0 PDE4C ST8SIA1 3.512976e-14 96.62416 0.967 0.682 8.320835e-10 0 ST8SIA1

packageVersion("Seurat") [1] ‘3.9.9.9010’

extract marker genes

seuratObject.markers=FindAllMarkers(object = seuratObject, only.pos = TRUE, min.pct=0.25) Calculating cluster 0 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02s
Calculating cluster 1 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02s
Calculating cluster 2 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=04s
Calculating cluster 3 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01s
head(seuratObject.markers) p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene UG0898H09 2.193383e-17 268.60693 0.934 0.642 5.195247e-13 0 UG0898H09 NPFFR2 5.595162e-16 239.19043 0.934 0.563 1.325270e-11 0 NPFFR2 ZFAT 8.933254e-15 51.32268 0.669 0.333 2.115931e-10 0 ZFAT KIF18B 1.604839e-14 Inf 1.000 0.777 3.801221e-10 0 KIF18B PDE4C 2.821303e-14 105.13520 1.000 0.807 6.682538e-10 0 PDE4C ST8SIA1 3.512976e-14 139.39920 0.967 0.682 8.320835e-10 0 ST8SIA1

Dazcam commented 3 years ago

Many Thanks for the update Erica.

I will have a go at running this today, and thanks for the link regarding the plotting issue. I am using R studio so this is almost certainly the cause.

Best,

Darren