Closed d33bs closed 3 months ago
This turned out to be a cloudpathlib
caching issue, which appears to have changed enough in recent versions enough that the garbage collector removes cached files before they can be referenced by CytoTable. I believe this was a bug present in the code prior to the updates, rather than being caused by the dependencies themselves (though cloudpathlib
or related dependencies may have impacted the results).
On resolving the caching issue, there appears to be a credentialing issue from botocore
.
Looking further into this I feel that there's likely something happening which doesn't play nice between moto
and cloudpathlib
. There are alternatives such as localstack
which we could explore, but generally I think we're following a path towards a possible anti-pattern with mocked resources which is consuming too much time for the focus of the project. A reference from Software Engineering at Google in Chapter 13 under the "Real Implementations" section somewhat backs this up:
Although test doubles can be invaluable testing tools, our first choice for tests is to use the real implementations of the system under test’s dependencies; that is, the same implementations that are used in production code. Tests have higher fidelity when they execute code as it will be executed in production, and using real implementations helps accomplish this. ... Using real implementations for dependencies makes the system under test more realistic given that all code in these real implementations will be executed in the test.
I chatted with @gwaybio about this briefly, and based on our discussion feel that removing moto
to rely on a real-world test would have better outcomes. This would allow us to avoid a mocked resource dependency for testing (and related issues), as well as provide us an avenue for a larger data test (data which we don't want to store in the repository)(possibly addressing https://github.com/cytomining/CytoTable/issues/193). Likely it'd be good to follow the patterns and suggestions on a testing resource from cytomining/pycytominer#375 .
Rough steps:
cellpainting-gallery
or similar S3 resource.
An error occurred with two of the tests during a scheduled and un-locked dependency test earlier today. The only failing test in these cases is:
tests/test_convert_threaded.py::test_convert_s3_path_csv
. This seems to happen due to a missing file from the mockedmoto
bucket to a local file, resulting in a failed data query from DuckDB.Reference to the job: https://github.com/cytomining/CytoTable/actions/runs/8812487779/job/24204265980