biosurf / cyCombine

Robust Integration of Single-Cell Cytometry Datasets
Other
24 stars 7 forks source link

salvage_problematic issue #26

Closed david-priest closed 1 year ago

david-priest commented 1 year ago

Hello again,

I have one batch (batch 1) which has missing/misstained channels. I want to impute these channels, and they are all present in all the other batches. However, those channels to be imputed in batch 1 were in some cases stained for different markers. I want to over-write these channels with the imputed result.

My approach is to run salvage_problematic once for each channel to be imputed. However, when I try this I get the error:

"Warning: Be aware that a cluster contains cells primarily from the dataset you wish to impute for. As a result, imputations were not made for those cells."

I'm assuming this results from the fact that due to the different staining, batch 1 gave rise to a unique cluster. I'm not sure if I fully understand the error. Shouldn't it be such that imputation clusters are only generated from batches 2, 3, ... ? Or that the imputation clustering at least ignores the information from the channels to be imputed in batch 1?

By the way, I set up the FCS files in Premessa by harmonizing the channel names across all batches, i.e. channels in batch 1 are mis-named (named as they are to be imputed) in order to import the files successfully.

I was wondering if you have made progress on handling the case we discussed previously in "Increasing Panels Over Time"? Ideally, I would like to remove the imputation channels from batch 1 in Premessa prior to importing into cyCombine.

In the meantime, I might try merging panels and leave a dummy non-stained channel in batch 1 so that impute_channels2 is not empty.

Kind Regards, David Priest

david-priest commented 1 year ago

I seem to be having success with the CyTOF 2 panels vignette (https://biosurf.org/cyCombine_CyTOF_2panels.html). Although batch 1 has no markers that are not in batch 2, I just chose a non-stained (but acquired) channel 140Ce, to remain in batch 1, such that the code can work without error.

cbligaard commented 1 year ago

Hi David,

Thanks for raising this issue and sorry about the slow response.

For naming “misalignment”, our advice is to run prepare_data separately on each set of batches using the same markers, with each having a curated panel file, as described in #24. You can then remove the channels from batch 1 before merging the datasets and performing imputation. This should also remove the need to use Premessa for harmonization, as you can simply change a column name or remove columns of the data.frame if you want.

You are right about the source of the error from salvage_problematic - it is due to cells from batch 1 not being clustered with the others. The algorithm works by clustering all cells from all the batches together using only the non-misstained markers, such that imputation occurs for misstained markers in cells from batch 1 using information from cells in other batches, which are co-clustered. Why this error occurs depends on the data and settings used. I would recommend that you have a look at the exclude parameter for salvage_problematic to make sure that you are not including any misstained channels in the clustering. Another solution would be to decrease xdim/ydim from the default settings.

For impute_across_panels, I am happy to hear that it works, but I wanted to mention that you can set either impute_channels1 or impute_channels2 to NULL to avoid using dummy channels as mentioned in the “Increasing Panels Over Time” topic from #15.

I hope it helps!

Best regards, Christina