digitalcytometry / ecotyper

EcoTyper is a machine learning framework for large-scale identification of cell states and cellular ecosystems from gene expression data.
Other
177 stars 41 forks source link

Wilcoxon-error when having only a single cell #56

Closed NoahHenrikKleinschmidt closed 1 year ago

NoahHenrikKleinschmidt commented 1 year ago

Long time no see, it's me again ☀️

This time it's not really an issue though. Previously I ran into a bug that caused the framework to crash when there were too few entries for a given cell type when the wilcoxon test was producing NaN. This bug was fixed but now I just encountered something that seems very similar.

In this case Cholangiocytes are reported to only have a single cell available in the entire dataset and my logs record that again the wilcoxon_twosample reports an error:

Running EcoTyper...

Step 1 (extract cell type specific genes)...
[1] "Cholangiocytes"
[1] "Endothelial_cell"
[ other cell types ... ]
Warning message:
Only 1 single cells are available for cell type: Cholangiocytes. At least 50 are required. Skipping this cell type from the EcoTyper analysis! 
[ the same message repeated about 10 times ... ]
row_wilcoxon_twosample: 17055 of the rows had less than 1 remaining finite "x" observation.
First occurrence at row 1 
Step 1 (extract cell type specific genes) finished successfully!

Step 2 (cell state discovery on correrlation matrices): Calculating correlation matrices...
Error: Only 1 single cells are available for cell type: Cholangiocytes. At least 50 are required. Skipping this cell type!
Execution halted
Warning message:
There are more than 2500 single cells available for cell type 'Endothelial_cell'. Subsampling to 2500 cells.

Filtering 'Monocytes' profiles for cell type specific genes...
[ more filtering messages but none for Cholangiocytes ... ]
Error in RunJobQueue() : 
  EcoTyper failed. Please check the error message above!
Execution halted

Full logs available via pastecode.io.

When I dropped the one Cholangiocyte cell everything worked fine so I don't think this is really an issue since one single cell does probably not affect the Ecotype finding process anyway. I was just curious if this is expected EcoTyper behavior or not.

Best, Noah 🌼

BALuca commented 1 year ago

Hi Noah,

Again, apologize for delay in answering. Having less than x (50) cells is not really meaningful for finding cell states and ecotypes, so EcoTyper throws an error when such cases are encountered. We updated the code so that it gives a more informative error message.

Best, The EcoTyper team