angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
71 stars 25 forks source link

Allow for fewer than specified SOM clusters to be contained in Pixie average file #1036

Closed alex-l-kong closed 1 year ago

alex-l-kong commented 1 year ago

Is your feature request related to a problem? Please describe.

The functionality of the SOM averaging process is to error out if not all the SOM clusters are contained. Previously, this was done to encourage users with very few pixels/FOVs to increase their subset. However, with high num_passes values, even users with a lot of pixels may find that the SOM combines clusters it determines as redundant, thus dropping the total below the specified number (default 100). In this case, it should not be indicated as an error.

Describe the solution you'd like

We should provide a more flexible approach when it comes to creating the SOM average file.

num_fovs_subset should still be provided for speed purposes, but it should not error out if fewer than 100 clusters are contained in the final averaged dataset. Instead, the script should just warn the user that it dropped certain SOM clusters, along with which ones. That way, the user can either increase num_fovs_subset, resample the data entirely, or continue if they're OK with it.

alex-l-kong commented 1 year ago

Looks like there may be a few confounding issues here, we'll close this out for now and open it up again if it pops up after addressing the aforementioned.