Open Chloe-Shen opened 5 months ago
Hi @Chloe-Shen! This is not an error, we never intended to force all donors to have OOR cells in our simulations. We think this represents a realistic scenario, where for samples/donor where less cells are captured, disease-associated cells might not be detected. If you want to use our code to generate a scenario where OOR cells are present in all donors, you can specify the cell annotation to use as OOR cells in the simulation.simulate_query_reference
function with the query_annotation
parameter.
I mean there should be at least one OOR cell in 'query' donors. For example, when selecting 'classical monocyte' as the perturbed cell type, there are some 'query' donors without any OOR cells. The current code splits 'ctrl' and 'query' donors without checking if there are any 'classical monocyte' cells in the donor (e.g., MH8919230, MH8919231, MH8919278, MH8919280).
Additional examples can be found in the attached document 'ng2023_dataset_group_oor_summary_colored.xlsx'.
Description:
There are some mistakes in the data simulation process. Specifically, several donors labeled as 'query' do not have any out-of-distribution (OOR) cells. This issue arises because none of the specified cell types were detected in these donors. Please see the attached document; the donors with this issue are highlighted in yellow.
ng2023_dataset_group_oor_summary_colored.xlsx