broadinstitute / cellpainting-gallery

Cell Painting Gallery
https://broadinstitute.github.io/cellpainting-gallery/
MIT License
64 stars 11 forks source link

Stable items on S3 for software testing purposes #80

Closed gwaybio closed 7 months ago

gwaybio commented 7 months ago

Hi All! Responses to prompts below:

Is your question related to a problem? Please describe.

We had an issue with pycytominer tests failing, and it seems due to an updated path in the JUMP S3 bucket. See https://github.com/cytomining/pycytominer/pull/374#issuecomment-1984011446 for complete details.

Describe the solution you'd like

It would be ideal if these paths were stable and we could build tests using them. If paths will update in the future, is it possible to keep a dedicated folder of test files that will not change?

Describe alternatives you've considered

As I describe in cytomining/pycytominer#374, we can delete the S3 integration with cyto_utils.cell_locations

ErinWeisbart commented 7 months ago

Hi Greg! It looks like you're pointing your tests to s3://cellpainting-gallery/test-cpg0016-jump/? This is not a stable folder.

In general, one can consider stable all S3 paths to objects within prefixes named in our README. In your example this would be s3://cellpainting-gallery/cpg0016-jump/.

(This comes with the caveat that right now there are some things moving around in JUMP as we prepare for final data release and several publications, but this is an exception to the rule.)

gwaybio commented 7 months ago

great thanks Erin, this is super helpful! pinging @d33bs so he is aware of this insight.

It seems like an easy option is to update the s3 path (remove test-) and brace for potential additional shake-ups prior to final data release/publications.

@ErinWeisbart - after this shake-up, do you anticipate additional changes to files? Might it still be worth to create something in the gallery dedicated for testing purposes?

ErinWeisbart commented 7 months ago

No, we anticipate things staying stable. That being said, given its scale and complexity, JUMP is the most likely to have shakeups if there are future shakeups that we don't have in-progress, so I would suggest pointing to any other dataset in the gallery for tests.

gwaybio commented 7 months ago

so I would suggest pointing to any other dataset in the gallery for tests.

Oh great point - @d33bs, I think we have enough info to proceed on this fix.

Thanks again Erin!