MarioniLab / oor_benchmark

A sandbox for benchmarking detection of out-of-reference cells in single-cell genomics data
MIT License
13 stars 2 forks source link

Data Simulation Errors - Missing OOR Cells for Certain Donors #5

Open Chloe-Shen opened 5 months ago

Chloe-Shen commented 5 months ago

Description:

There are some mistakes in the data simulation process. Specifically, several donors labeled as 'query' do not have any out-of-distribution (OOR) cells. This issue arises because none of the specified cell types were detected in these donors. Please see the attached document; the donors with this issue are highlighted in yellow.

ng2023_dataset_group_oor_summary_colored.xlsx

emdann commented 5 months ago

Hi @Chloe-Shen! This is not an error, we never intended to force all donors to have OOR cells in our simulations. We think this represents a realistic scenario, where for samples/donor where less cells are captured, disease-associated cells might not be detected. If you want to use our code to generate a scenario where OOR cells are present in all donors, you can specify the cell annotation to use as OOR cells in the simulation.simulate_query_reference function with the query_annotation parameter.

Chloe-Shen commented 5 months ago

I mean there should be at least one OOR cell in 'query' donors. For example, when selecting 'classical monocyte' as the perturbed cell type, there are some 'query' donors without any OOR cells. The current code splits 'ctrl' and 'query' donors without checking if there are any 'classical monocyte' cells in the donor (e.g., MH8919230, MH8919231, MH8919278, MH8919280). image

Additional examples can be found in the attached document 'ng2023_dataset_group_oor_summary_colored.xlsx'.