NathanSkene / EWCE

Expression Weighted Celltype Enrichment. See the package website for up-to-date instructions on usage.
https://nathanskene.github.io/EWCE/index.html
53 stars 25 forks source link

invalid number of intervals error when running generate.celltype.data() #4

Closed alexandruioanvoda closed 5 years ago

alexandruioanvoda commented 5 years ago
> celltype_rda_file = generate.celltype.data(exp=exp, annotLevels = annotLevels, "Project name")
Error in cut.default(matrixIn[matrixIn > 0], breaks = unique(quantile(matrixIn[matrixIn >  : 
  invalid number of intervals

two compressed RDS files for exp and annotLevels: RDS_files_for_exp_and_annotLevels.zip

alexandruioanvoda commented 5 years ago

Just realized that it might be due to a few NA values in the expression matrix. Will future EWCE support at least a certain threshold of NA values in the future?

NathanSkene commented 5 years ago

Would make sense to add an error check for it, but I think it's best to let users choose how to get rid of NA's, no? Would depend on why the NAs were appearing in the dataset.

alexandruioanvoda commented 5 years ago

Apparently it's because the dataset is merged from two (Blueprint project & ENCODE project). The expression matrix comes from here: https://github.com/dviraran/SingleR/blob/master/data/blueprint_encode.rda

The arrays used on one of the projects has a few more probes than the other, and vice-versa.

But both contain similar sample types. It would be great if EWCE would:

  1. warn the user, and
  2. ask whether user is okay with specificities for that gene to be set to NA for all celltypes
    1. whether to compute specificities for that gene only against the celltypes that don't have NAs for the same gene.
NathanSkene commented 5 years ago

I've added an error check to generate.celltype.data to detect NAs. Thanks for mentioning this!

IMJoeyZhu commented 4 years ago

I get the same feedback when I running this coed generate.celltype.data(exp=exp, annotLevels = annotLevels, "Project name"). However, I found no NAs in my dataset. How could I fix it? image

NathanSkene commented 4 years ago

Could you try reducing the dataset to a minimally reproducible example which still shows the error, and then upload it?

Thanks

From: IMJoeyZhu notifications@github.com Reply-To: NathanSkene/EWCE reply@reply.github.com Date: Wednesday, 18 March 2020 at 09:34 To: NathanSkene/EWCE EWCE@noreply.github.com Cc: "Skene, Nathan G" n.skene@imperial.ac.uk, State change state_change@noreply.github.com Subject: Re: [NathanSkene/EWCE] invalid number of intervals error when running generate.celltype.data() (#4)

This email from notifications@github.com originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

I get the same feedback when I running this coed generate.celltype.data(exp=exp, annotLevels = annotLevels, "Project name"). However, I found no NAs in my dataset. How could I fix it? [Image removed by sender. image]https://user-images.githubusercontent.com/37504708/76946169-94967a80-693e-11ea-8feb-8cfab866b09c.png

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/NathanSkene/EWCE/issues/4#issuecomment-600516815, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE6CKUNVG4XIQXXU44LRICIRRANCNFSM4HE52DSQ.

IMJoeyZhu commented 4 years ago

I tried to reduce my dataset from an 8k cells matrix to a 10 cells matrix, the error still showed up no matter how many cells I used.

image image image image

NathanSkene commented 4 years ago

Did you try reducing the number of genes to find a subset which cause the error? You don't have any rows which are all zeros or some other wierd values?

IMJoeyZhu commented 4 years ago

Thanks for your reply. I checked my dataset and remove rows which the Sum value lower than zero, and got the same error again. image

bschilder commented 3 years ago

So this can happen when you have a column in your specificity matrix with only 1 non-zero value. cut() doesn't know how to handle these situations because it's ambiguous whether the non-zero value should be the lowest quantile, top quantile, middle quantile?

I've modified the underlying function to accomodate for these situations such that it will use the middle quantile by default. Let me know if you prefer a different behaviour @NathanSkene.

This is currently only implemented in my bschilder/EWCE@DelayedArray branch, but will be merging this with the main EWCE soon.

bin.columns.into.quantiles <- function(matrixIn,
                                       numberOfBins=40,
                                       defaultBin=as.integer(numberOfBins/2)){
    quantileValues = rep(0, length(matrixIn))
    breaks <- unique(quantile(matrixIn[matrixIn > 0],
                              probs = seq(0, 1, by = 1/numberOfBins),
                              na.rm = TRUE))
    if(length(breaks)>1){
        quantileValues[matrixIn > 0] = as.numeric(cut(matrixIn[matrixIn > 0],
                                                      breaks = breaks,
                                                      include.lowest = TRUE))
    }else {
        ## In situations where there's only one non-zero quantile, cut() throws an error.
        ## Avoid these situations by using a default quantile.
        message("+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile ",
                "(",defaultBin,")")
        quantileValues[matrixIn > 0] <- defaultBin
    }
    return(quantileValues)
}
roxyisat-rex commented 3 years ago

I tried to reduce my dataset from an 8k cells matrix to a 10 cells matrix, the error still showed up no matter how many cells I used.

image image image image

Hi Have you solved this in the end? Quite interested to know because I am running into the same error and tried the above suggestions but still not working. Be great to hear how you solved it? Thanks!

NathanSkene commented 3 years ago

Can you upload the 10 cell dataset + annotation info?

On Mon, 7 Jun 2021 at 19:08, roxyisat-rex @.***> wrote:

This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

I tried to reduce my dataset from an 8k cells matrix to a 10 cells matrix, the error still showed up no matter how many cells I used.

[image: image] https://user-images.githubusercontent.com/37504708/77040098-3fb43c00-69f2-11ea-9af6-a99bd446292d.png [image: image] https://user-images.githubusercontent.com/37504708/77039889-d6ccc400-69f1-11ea-9855-42e83615f253.png [image: image] https://user-images.githubusercontent.com/37504708/77039740-8a818400-69f1-11ea-8c16-eea7724c7318.png [image: image] https://user-images.githubusercontent.com/37504708/77039675-6625a780-69f1-11ea-9c3f-91e53aaec6a3.png

Hi Have you solved this in the end? Quite interested to know because I am running into the same error and tried the above suggestions but still not working. Be great to hear how you solved it? Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NathanSkene/EWCE/issues/4#issuecomment-856151076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5ZPE2EAABHLWNGYZJX4DLTRUDKHANCNFSM4HE52DSQ .

combiz commented 2 years ago

I've also encountered this issue with Error in cut.default(matrixIn[matrixIn > 0], breaks = unique(quantile(matrixIn[matrixIn > : invalid number of intervals. In my case, the solution was identified after examining representation (n cells) across each category of group identity (e.g. Leiden clusters) – providing groups with relatively few observations/cells may result in this error. Solutions may include aggregating or dropping the minority cells, re-assigning group identity (e.g. after Leiden clustering and dimensionality reduction parameter optimization), etc.