OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
95 stars 73 forks source link

Unpin netcdf4 and pandas to check what is breaking [all tests ci] #1111

Closed ocefpaf closed 1 year ago

ocefpaf commented 1 year ago

I'd like to run this locally but I don't have access to the test data drive to run the full test suite.

ocefpaf commented 1 year ago

@lsetiawan I need your help understanding the CIs here. It seems that the tests did not run, is that correct?

emiliom commented 1 year ago

Thanks for chiming in @ocefpaf ! First, our development branch is dev; I suggest running tests there rather than on main. In dev we already removed the Pandas pinning, and all is good there. But we are running into a number of issues with netCDF4. #988 describes one of them, but lately we've been running into more that we haven't documented in a GitHub issue yet.

The test data are packaged in a Docker image that's rebuilt daily. The tests can be run locally with these instructions, which first deploy two docker images that include the test data.

lsetiawan commented 1 year ago

The current failure seen in CI is exactly this one: https://github.com/Unidata/netcdf4-python/issues/1175

It is the reason we pinned 😬

ocefpaf commented 1 year ago

@lsetiawan I could not find any other issue besides the variable length compression. If so, let me know. I guess you have everything you need to bring this one home. I'll close this b/c you know better the internals of the package to find the strings and other vlen, to remove compression from them.

PS: As we discussed on slack, https://github.com/Unidata/netcdf-c/pull/2716 will "fix this" but it is better to explicitly disable compression on variables that won't be compressed IMO. Also, that would fix this right now instead of whenever they make a new release ;-p

ocefpaf commented 1 year ago

The current failure seen in CI is exactly this one: Unidata/netcdf4-python#1175

This one https://github.com/Unidata/netcdf4-python/issues/1205 probably describes is better and has the links to the issue upstream in netcdf-c.

ocefpaf commented 1 year ago

Thanks for chiming in @ocefpaf ! First, our development branch is dev; I suggest running tests there rather than on main. In dev we already removed the Pandas pinning, and all is good there. But we are running into a number of issues with netCDF4. #988 describes one of them, but lately we've been running into more that we haven't documented in a GitHub issue yet.

The test data are packaged in a Docker image that's rebuilt daily. The tests can be run locally with these instructions, which first deploy two docker images that include the test data.

I don't have any stake on this but having non-standard branches, convoluted way to obtain data for running tests, and title coded messages to activate them, makes this less open to contributions. I would suggest to have at least a subset of the tests to run without jumping through hoops. In the long term, have the test data automatically fetched with pooch would be even better.

Anyway, thanks to @lsetiawan's guidance we figured out the problem and a fix is on the way!