RGLab / openCyto

A package that provides data analysis pipeline for flow cytometry.
GNU Affero General Public License v3.0
75 stars 29 forks source link

Improvements to openCyto Clustering Framework #236

Closed DillonHammill closed 1 year ago

DillonHammill commented 2 years ago

Hi @mikejiang,

Thanks again for adding support for empty dims in the gatingTemplate. I have put together this PR to include the additional changes that I have proposed or changes necessary to get the clustering gating to work.

The changes are as follows:

  1. Added an error to the gatingTemplate constructor when an empty gatingTemplate is supplied either as a CSV or data.table.
  2. Terminate gating process after the stop.at node has been added. As I mentioned previously, the current implementation terminates the gating prior to the stop.at node which means that there are others gates applied which may be undesirable.
  3. Added a minor improvement to .preprocess_row() to ensure that NA dims are handled correctly, since dims = NA is the default in gs_add_gating_method(). This allows the gating process to proceed if dims are omitted in gs_add_gating_method().
  4. I have fixed the handling of groupBy = NA and collpaseDataForGating = TRUE to place all samples in the same group - at the moment each sample is placed in its own group. In order to do this, I have given the group containing all the samples the name "all" in both the gating and preprocessing methods. Happy to change this name to whatever you like.
  5. I ran into an issue with .gating_adaptor() and the clustering methods with collapseDataForGating = TRUE. The .gating_adaptor() repeats the gate for each sample, which is not desirable for when using logical or factor filters on the collapsed data since the number of events will vary in each sample (we run into index out of bounds errors since the factor length is the sum of all the events across samples). It is a bit of work to split these gates into separate gates within .gating_adaptor(), so instead I split these gates per sample within my gating function and return a list of filterResults one per each sample. I have have updated the .gating_adaptor() to prevent duplication of filters if they are supplied in a named list per sample.

I ran all the tests locally and everything passes after these changes, I have also confirmed that everything is working within CytoExploreR as well.

Let me know if there are any changes you would like me to make.

Thanks again for your help.

Dillon

DillonHammill commented 2 years ago

@mikejiang, I also added .preprocess_row() to list of CytoExploreR exports so that I can use it instead of .preprocess_csv() when required, so I can bypass undesirable checks.

DillonHammill commented 2 years ago

Hi @mikejiang, please can you review this PR when you get some time so I can make the changes that you require. I am preparing for the release of CytoExploreR version 2.0.0 at the end of the month so it would be great if these changes were available on BioC by then. I have been testing my updated fork for a while now and I haven't run into any issues.