biosurf / cyCombine

Robust Integration of Single-Cell Cytometry Datasets
Other
23 stars 6 forks source link

Issues associated with `Warning: emd: Maximum number of iterations has been reached (500)` in `cyCombine::detect_batch_effect` #55

Open denvercal1234GitHub opened 1 month ago

denvercal1234GitHub commented 1 month ago

Hi there again (apologies),

When running cyCombine::detect_batch_effect after the initial exploratory clustering of the uncorrected spectral flow data as below. I encountered countless of Warnings from emd. Then, it produced error.

Likely because I set "clean_colnames = FALSE" in prepare_data()?

Thank you.

cyCombine::detect_batch_effect(F64Singlet_spectral, batch_col = 'batch', out_dir = paste0(data_dir, '/cyCombine_detect_batch_effect'), xdim = 6, ydim = 6, markers= F64Singlet_sfc_markers, seed = 434, name = 'F64Singlet_spectral_uncorrected', downsample = NULL, norm_method = "scale", label_col = "label")

Warning

Using existig cell type labels.
Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd
....

There are 0 clusters, in which a single cluster is strongly over- or underrepresented.
Making UMAP plots for up to 50,000 cells.
Error in parse(text = x, keep.source = FALSE) : 
  <text>:1:11: unexpected symbol
1: Comp-LIVE DEAD
              ^
shdam commented 1 month ago

Hmm, the error should be solved by adding "Comp-LIVE DEAD" to non_markers.

I believe the warning is because the binSize is too low. It splits the data into bins of 0.1, which works well for transformed CyTOF values (~ 0-5), but if your values are significantly larger, increasing the binSize proportionally could help. I will look into setting the binSize dynamically when I find time.

Hope this helps :)

Best regards, Søren

denvercal1234GitHub commented 1 month ago

Hi Søren,

Thanks for your pointers.

Q1. The code ran but the Warning persisted with quite a strange output below, showing only 1 cluster that is different but yet the percent is quite similar across batches?

cyCombine::detect_batch_effect(F64LiveSinglet_spectral, batch_col = 'batch', out_dir = paste0(data_dir, '/cyCombine_detect_batch_effect'), xdim = 6, ydim = 6, markers= F64LiveSinglet_sfc_markers_cleaned, seed = 434, name = 'F64LiveSinglet_spectral_uncorrected', downsample = NULL, norm_method = "scale", label_col = "label")

F64LiveSinglet_sfc_markers_cleaned now does not have Live Dead and I removed all dashes in the marker names.

label is from the initial clustering of the uncorrected.

Using existig cell type labels.
Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)Warning: emd: Maximum number of iterations has been reached (500)

There are 0 markers that appear to be outliers in a single batch:

There are 1 clusters, in which a single cluster is strongly over- or underrepresented.
The cluster percentages for each batch in cluster 5 are:
F64D1 = 0.24 %, F64D2 = 0.24 %, F64D3 = 0.32 %
The cluster expresses CD5, CD38, CD31, CD3, CD152, CD162, TCRab,...

Q2. Do you think the UMAP of uncorrected below does not look like there was much of a batch effect by day of running? Therefore, do you think a batch alignment here would not be necessary or even detrimental to run?

Screenshot 2024-07-19 at 23 50 01 Screenshot 2024-07-20 at 10 54 45

I asked because the result from the UMAP from cyCombine::detect_batch_effect does not seem to suggest batch alignment was needed, but when examining the gMFI of every marker for every day of running, markers 12 or 9 seem to be quite different between 3 batches. There was differences as shown below. The top most black line is just the compensation controls.

Q3. In running cyCombine::detect_batch_effect, do we need to remove the cells belong to the "anchor"/reference PBMC sample I had in each batch?

shdam commented 1 month ago

Hey, glad to see it solved the error at least.

I believe the warning is caused by a too small binSize. What is the range of values in your data? I haven’t played around with this option, but try scaling the data (with cyCombine::normalize(norm_method=“scale”) before detecting batch effects.

Visually, it seems there are some minor batch effects that you would benefit from correcting.

Q3. you can argue for both. They also inform of batch effects, but they aren’t technically necessary to include.

I hope this answer is helpful :)

Best regards, Søren

denvercal1234GitHub commented 1 month ago

Thank you very much Søren for your continued help with this. Hopefully it will also be helpful to other users of cyCombine.

Might I quickly confirm that I should run cyCombine::normalize(norm_method=“scale”) before running cyCombine::detect_batch_effect(norm_method = "scale")? Because I saw within cyCombine::detect_batch_effect there is already an argument for norm_method; does this double-scale the data?

shdam commented 1 month ago

Hey,

Yes, you could do something like:

F64LiveSinglet_spectral |>
   cyCombine::normalize(norm_method=“scale”) |>
   cycombine::detect_batch_effect(…)

Normalizing in detect_batch_effect is solely before clustering (which you don’t do when specifying labels). You are thus not double-scaling, and specifying norm_method there is redundant - same with x/ydim.

Best regards, Søren