angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
69 stars 25 forks source link

General CI Updates #1114

Closed srivarra closed 4 months ago

srivarra commented 5 months ago

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Fixes several issues plauging CI.

  1. Fixes the example dataset not being found across macOS and Windows GitHub Actions Runners.
  2. Adds the EZ Segmentation dataset to the the ./github/scripts/get_example_dataset.py download script.
  3. Updated outdated GitHub Actions Workflows

How did you implement your changes

Example Dataset

Simplified the CI download script.

conftest.py directly access the GITHUB_WORKSPACE environment variable which is set in CI, thus making the path OS agnostic.

Modified the example_dataset.ExampleDataset class to:

For example:

'/Users/user/.cache/huggingface/datasets/downloads/extracted/<hash>'
'pathlib.path(self.dataset_cache) / downloads/extracted/<hash>/<feature_name>'

CI

Dependencies

Updated the following:

The valid datset configs are gathered from the HuggingFace repo itself now, and .github/scripts/get_example_dataset.py is simplified.

Pixel Clsutering

Adds natsort calls throughout the Pixie pipeline to avoid issues with channel ordering. Removed the parameter channels in pixel_som_clustering.py::cluster_pixels as it is an unused parameter. Reflected the change in Notebook 2.

Misc

Adjusted the runtime Protocol definition for ClusterClassTemplate to be syntactically correct. Removed a few cibuildwheel flags that are not needed.

Remaining issues

CI / Dependencies

Dataset

There have been several improvements to the general dataset workflow with the HuggingFace API. We should consider seeing what new features exist to make the maintenance of it easier for us.

review-notebook-app[bot] commented 5 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

camisowers commented 4 months ago

Why does updating to alpineer v0.1.12 not cause issues the same way it did in toffy? https://github.com/angelolab/alpineer/issues/43

srivarra commented 4 months ago

@camisowers I'm unable to replicate that issue occurring in Toffy. For example I ran the following on the example data:

io_utils.list_files(dir_name = os.path.join(base_dir, "image_data", "fov0"), substrs=".tiff")

And the I got the expected output:

Output ```python ['CD14.tiff', 'H3K27me3.tiff', 'HLADR.tiff', 'Ki67.tiff', 'Collagen1.tiff', 'CD45.tiff', 'GLUT1.tiff', 'CK17.tiff', 'CD68.tiff', 'CD163.tiff', 'Fibronectin.tiff', 'Vim.tiff', 'CD8.tiff', 'CD4.tiff', 'H3K9ac.tiff', 'ECAD.tiff', 'SMA.tiff', 'CD31.tiff', 'IDO.tiff', 'CD20.tiff', 'PD1.tiff', 'CD3.tiff'] ```

Is there a specific combination of arguments which causes trouble?