FloWuenne / scFunctions

Functions for single cell data analysis
20 stars 9 forks source link

error in calculate_rrs: Progress: 21% Error in jensen_shannon(x[, 1], x[, 2], test.na, unit) #2

Closed Sophia409 closed 4 years ago

Sophia409 commented 4 years ago

Hello, Florian

I can use kmeans_thresholds to finsh the pipeline you provided, but I'm not sure if it is appropriate for my data. You mentioned that auc_thresh_kmeans function might not perform very well for setting thresholds for non-bimodal distribution, which are quite often observed for regulons. So I want to use AUCell_exploreThresholds to determine thresholds for binarization.

But it gave this error when I choose the Thresholds that SCENIC provided.

Progress: 21% Error in jensen_shannon(x[, 1], x[, 2], test.na, unit) : Your input vector stores NA values...

I find the reason is because of the production of NaN in regulon_norm. But why does it produce NA in some regulons with the Thresholds that SCENIC provided ? How can I solve it?

Hope for your reply

cells_AUCellThresholds <- AUCell_exploreThresholds(regulonAUC,
                                                     plotHist=FALSE, assignCells=TRUE)
  trhAssignment <- getThresholdSelected(cells_AUCellThresholds)
  trhAssignment <- signif(trhAssignment, 3) # TODO why is it sometimes a list? https://github.com/aertslab/AUCell/issues/3
 > trhAssignment <- as.list(trhAssignment)
> head(trhAssignment)
$`AU041133 (7g)`
[1] 0.0132

$`Ahr (16g)`
[1] 0.00542

$`Ar (137g)`
[1] 0.0168

$`Arid3a (31g)`
[1] 0.00834

$`Arnt (6g)`
[1] 0.0229

$`Arx (168g)`
[1] 0.0846
>binary_regulons <- binarize_regulons(regulonAUC,trhAssignment)
> joined_bin_reg <- binary_regulons %>%
+   reduce(left_join,by="cells")
> rownames(joined_bin_reg) <- joined_bin_reg$cells
> joined_bin_reg <- joined_bin_reg[2:ncol(joined_bin_reg)]
> binary_regulons_trans <- as.matrix(t(joined_bin_reg))
> binary_regulons_trans[1:10,1:3]
              E11-1_CTTAGGATCGACAGCC E11-1_TGACTAGCATAAAGGT E11-1_TTCGGTCAGATGAGAG
AU041133 (7g)                      0                      0                      0
Ahr (16g)                          0                      0                      0
Ar (137g)                          0                      0                      0
Arid3a (31g)                       1                      1                      1
Arnt (6g)                          0                      0                      0
Arx (168g)                         0                      0                      0
Ascl1 (9g)                         0                      0                      0
Atf3 (282g)                        0                      0                      0
Atf4 (294g)                        0                      0                      0
Atf5 (130g)                        1                      1                      1
> ## RRS
> rrs_df <- calculate_rrs(metadata_sub,
+                         binary_regulons = binary_regulons_trans,cell_type_column ="cell_type")
[1] "Processing cell type: 1 8"
Progress:  21%  Error in jensen_shannon(x[, 1], x[, 2], test.na, unit) : 
  Your input vector stores NA values...

> traceback()
7: stop(list(message = "Your input vector stores NA values...", 
       call = jensen_shannon(x[, 1], x[, 2], test.na, unit), cppstack = list(
           file = "", line = -1L, stack = "C++ stack not available on this system")))
6: jensen_shannon(x[, 1], x[, 2], test.na, unit)
5: distance(x = x, method = "jensen-shannon", test.na = test.na, 
       unit = unit, est.prob = est.prob)
4: philentropy::JSD(dist_df)
3: withCallingHandlers(expr, message = function(c) invokeRestart("muffleMessage"))
2: suppressMessages(philentropy::JSD(dist_df))
1: calculate_rrs(metadata_sub, binary_regulons = binary_regulons_trans, 
       cell_type_column = "cell_type")

image

FloWuenne commented 4 years ago

Hi @Sophia409 ,

thanks for using my functions and sorry I couldn't reply fast enough for the other issue you posted! I saw that you closed the issue (#1 ) so I am guessing you solved that problem?!

Regarding this issue, thanks for providing your code and explaining your problem. It's difficult for me to know exactly why it is producing NaN in the calculate_rrs() function without seeing the data. My guess would be, that for a specific regulon all cells have a 0 for binarized regulon activity (therefore the threshold might have been set too high by SCENIC) and that calculate_rrs() is trying to calculate 0/0 which would yield a NaN. To test this, I would do:

rownames(binary_regulons_trans)[rowSums(binary_regulons_trans) == 0]

If this does return a regulon that only has 0 binarized regulon activity with the threshold provided, you can either go back and change the threshold for that regulon, or you can filter it out of your binary regulon matrix!

Please let me know whether this fixes your issue and sorry for not including a fail switch for this. I just pushed an update (d4edb2a11ab9f4a52e0067894cbf8255cc1b85a1) that will actually check whether the sum() of all binarized activities is > 0 and will print an error if it is not!

Sophia409 commented 4 years ago

@FloWuenne Hello, Florian

I am very glad to receive your reply! Thank you for your help.

I solved issue1 finally by detach other packages which have some same functions with scFunctions that cause conflicts, such as dplyr.

And just as you say, I have five regulons that only has 0 binarized regulon activity with the AUCell_exploreThresholds provided. And their thresholds value are high. But i'm unfamiliar with this SCENIC function and don't know how to set them to appropriate values. So I delete them from my binary regulon matrix and finish the remaining pipeline.

rownames(binary_regulons_trans)[rowSums(binary_regulons_trans) == 0] [1] "Hmgb3 (53g)" "Klf6 (218g)" "Klf7 (306g)" "Nfyb (139g)" "Smad1 (120g)" [6] "Sox12 (528g)"

SCENIC tutorial mentions that AUCell automatically calculates possible thresholds for the binarization, but they are often too conservative. Wondering why AUCell_exploreThresholds set too high a value for just these five regulons?

Second, what's the difference between runSCENIC_4_aucell_binarize function with your binarize_regulons function. The codes are not in agreement.

Last, why are my regulon modules not obvious in csi_modules plot? But in the publication by Suo et al. (2018) in Cell Reports, regulon modules are so perfect. How can I make a plot like this? I don't think my result should be true. In your function, you calculate the CSI scores for all regulon pairs based on the AUCs matrix for all regulons. Can it be better to calculate it based on Binarized regulon activity matrix? 1 Suo's data image image 2 my data image image

Hope for your help! Best

FloWuenne commented 4 years ago

Sorry again for the late reply, I did not check up on Github over the Christmas holidays and New Years! Happy 2020 @Sophia409 !

About the difference between runSCENIC_4_aucell_binarize and binarize_regulons, it has been quite some time that I looked in depth at how exactly the SCENIC code itself runs. But my function should do the same job, that is give each cell a 0 or 1 for each regulon based on the thresholds provided. I think I used different data structures as they did in the original function to make it easier to work with it for myself. (I like tidy data).

Regarding the CSI modules, are you plotting the same type of data or are you just comparing Suo's plot with the plot for your specific data? If you are just trying to reproduce the plot from Suo with the same data, I agree that they look very different, but if this is your data, it is totally possible, that your modules are less distinct. This will also be heavily influenced by the thresholds you set for your regulons! Let me know whether this is your data or the Suo data trying to reproduce the plot please!

Let me know if you have any other questions please!