UXARRAY / uxarray

Xarray extension for unstructured climate and global weather data analysis and visualization.
https://uxarray.readthedocs.io/
Apache License 2.0
148 stars 31 forks source link

Hardcoded File Path Causing Test Failure #860

Open amberchen122 opened 1 month ago

amberchen122 commented 1 month ago

Description The hard coded file path in test/test_api.py::TestAPI::test_open_mf_dataset is unsafe. I added a test file to test/meshfiles/ugrid/outCSne30/outCSne30_test2.nc from Paul, and it failed.

Details

The test test/test_api.py::TestAPI::test_open_mf_dataset is failing after I added test/meshfiles/ugrid/outCSne30/outCSne30_test2.nc.

I think the reason is because of this hardcoded file path incorrectly matching outCSne30_test2.nc.

Error Message

FAILED test/test_api.py::TestAPI::test_open_mf_dataset - ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation
philipc2 commented 1 month ago

Yeah this should really be a list of paths since that file path will assume any paths that match the outCSne30_ prefix with a .nc ending. We should have it point to the two data files as seen below.

image


dsfiles_mf_ne30  = [current_path / "meshfiles" / "ugrid" / "outCSne30" / "outCSne30_var2.nc",
                    current_path / "meshfiles" / "ugrid" / "outCSne30" / "outCSne30_vortex.nc"]
erogluorhan commented 1 month ago

Even though explicitly listing the file names to be opened with open_mfdatasets can be an alternative solution, I don't think giving file name patterns as in here is the actual reason of the problem.

That is common practice to have multiple data files from the same workflow in the same folder, e.g. separate files for different time stamps in a time-series analysis or separate files for different variables in a big simulation. That said, if we think all of the files under the test/meshfiles/ugrid/outCSne30/ directory are somehow related to each other in such a way, let us keep storing them there; otherwise, we need to consider oving some of them into a different folder. If the former, then I'd expect a code something like this should still work:

dsfiles_mf_ne30 = str( current_path) + "/meshfiles/ugrid/outCSne30/outCSne30_*.nc"

uxds_mf_ne30 = ux.open_mfdataset(self.gridfile_ne30, self.dsfiles_mf_ne30, , concat_dim="Time", combine="nested") # Use the actual dimension variable to be concatenated over those files
amberchen122 commented 1 month ago

Thank you for your input! I will move the test files needed for zonal-mean testing (test/meshfiles/ugrid/outCSne30_zonal_test/outCSne30_test2.nc and test/meshfiles/ugrid/outCSne30_zonal_test/outCSne30_test2.nc) to a new folder: test/meshfiles/ugrid/outCSne30_zonal_test.

Does this sound like a good solution for the zonal-mean PR failing the API test?

erogluorhan commented 1 month ago

Before that, please determine if those newer zonal test files are very close in format/content with the original outCSne30_var2.nc and outCSne30_vortex.nc files. If that's the case, I'd instead suggest that you do the following code change in the failing tests, and it should work:

dsfiles_mf_ne30 = str( current_path) + "/meshfiles/ugrid/outCSne30/outCSne30_*.nc"

uxds_mf_ne30 = ux.open_mfdataset(self.gridfile_ne30, self.dsfiles_mf_ne30, , concat_dim="Time", combine="nested") # Use the actual dimension variable to be concatenated over those files

If that's not the case though, then your suggestion looks good.

amberchen122 commented 1 month ago

Before that, please determine if those newer zonal test files are very close in format/content with the original outCSne30_var2.nc and outCSne30_vortex.nc files. If that's the case, I'd instead suggest that you do the following code change in the failing tests, and it should work:

dsfiles_mf_ne30 = str( current_path) + "/meshfiles/ugrid/outCSne30/outCSne30_*.nc"

uxds_mf_ne30 = ux.open_mfdataset(self.gridfile_ne30, self.dsfiles_mf_ne30, , concat_dim="Time", combine="nested") # Use the actual dimension variable to be concatenated over those files

If that's not the case though, then your suggestion looks good.

Thank you for your suggestion. I believe the zonal test files are in close format with the original test files. I have updated the test_open_mf_dataset as you recommended.

The new uxds_mf_ne30.uxgrid._ds.data_vars has one more dimension edge_node_connectivity:

test/test_api.py uxds_mf_ne30.uxgrid._ds.data_vars: grid_topology int32 4B -2147483647 face_node_connectivity (n_face, n_max_face_nodes) int64 173kB 0 8 ... 6 298 node_lon (n_node) float64 43kB -45.0 45.0 ... 138.0 135.0 node_lat (n_node) float64 43kB ... edge_node_connectivity (n_edge, two) int64 173kB 0 8 0 ... 5400 5400 5401

Therefore, I also modified the test constants in test/constants.py. Could you please verify if the new test parameters look good? The modifications are in the zonal-mean branch.

If this solution to the test_api.py bug is acceptable, then PR#785 is also ready for review. :)

erogluorhan commented 1 month ago

Yes, this looks like a good solution to me; thanks a lot! Hey @philipc2 , do you agree with me that this will be a better solution forward?