digitalcytometry / cytotrace2

CytoTRACE 2 is an interpretable AI method for predicting cellular potency and absolute developmental potential from scRNA-seq data.
Other
73 stars 4 forks source link

error running the cytotrace2() function #26

Closed mycecilia closed 1 month ago

mycecilia commented 1 month ago

I got an error message when runing the following lines:

data <- as.matrix(obj@assays$RNA$counts)
cytotrace2_result <- cytotrace2(data) 
cytotrace2: Started loading data
Dataset contains 32135 genes and 14899 cells.
cytotrace2: Running on 1 subsample(s) approximately of length 10000
cytotrace2: Started running on subsample(s). This will take a few minutes.
cytotrace2: Started preprocessing.
The function expects an input of type 'data.frame' or 'data.table'.
Attempting to convert the provided input to the required format.
0 input genes mapped to model genes.
cytotrace2: Started prediction.
This section will run using  5 / 64 core(s).
cytotrace2: Started postprocessing.
cytotrace2: Running with fast mode (subsamples are processed in parallel)
This section will run on 15 sub-sample(s) of approximately 993 cells each using 15 / 64 core(s).
---> Checking zero-variance data...
--->     Total number of variables:  993
--->     WARNING: 993 variables found with zero variance
---> Maximum number of splits: floor(n/2) = 0
---> WARNING: number of splits nSplit > 0
---> WARNING: using maximum number of splits: nSplit = 0
...
...
--->     Total number of variables:  993
--->     WARNING: 993 variables found with zero variance
---> Maximum number of splits: floor(n/2) = 0
---> WARNING: number of splits nSplit > 0
---> WARNING: using maximum number of splits: nSplit = 0
Error in names(cytotrace) <- unlist(sample_names) : 
  'names' attribute [14899] must be the same length as the vector [15]

After doing a little digging in the source code, it seemed that the fast mode generated 15 subsamples but it's somehow causing this error. Could you help check this bug? Thank you.

Shiyu

savagyan00 commented 1 month ago

Hi and thanks for using CytoTRACE 2,

From the messages, it looks like none of your input genes mapped to the model features (("0 input genes mapped to model genes"). Could you please provide some more details about your input to help us understand the issue better?

Please let us know, and if none of these appears to be the root of the issue we will be happy to investigate it further.

mycecilia commented 1 month ago

Hi and thanks for using CytoTRACE 2,

From the messages, it looks like none of your input genes mapped to the model features (("0 input genes mapped to model genes"). Could you please provide some more details about your input to help us understand the issue better?

  • Does your input have the correct format, such as having genes as rows and cells as columns, with row and column names set accordingly?
> data[1:5,1:5]
5 x 5 sparse Matrix of class "dgCMatrix"
           CELL1_N2  CELL2_N2  CELL3_N6 CELL4_N4  CELL5_N3
ZH01G00010        . 0.3781146 5.4610153  1.74801 .        
ZH01G00240        . 0.6151951 0.0000000  .       0.8525815
ZH01G00750        . .         0.1654955  .       .        
ZH01G00020        . .         1.1650430  .       .        
ZH01G00420        . 0.6445649 2.1085691  .       .        

This is how my expression data look. Row names and column names are gene and cell IDs respectively.

  • What species data does your input dataset contain? If your input is a human dataset please make sure to specify the species argument to be "human".

The data is from a crop. Not a model species.

Please let us know, and if none of these appears to be the root of the issue we will be happy to investigate it further.

savagyan00 commented 1 month ago

Thank you for your response!

The data format you provided appears to be correct. However, the genes in your input do not overlap with the model features or the orthology mapping supported by our tool, which raises the issue we saw of having no genes mapped. CytoTRACE 2 was specifically developed for use with mouse and human data, and it accepts gene names in MGI and HGNC nomenclatures accordingly. It has not been trained or tested on plant data, so we cannot guarantee results for your case.

Sorry for the inconvenience. Please let us know if we can assist you further!