Closed kvantricht closed 2 months ago
FYI there's something more to be fixed about this. There's a failing test revealing that combine_predictions
no longer works. It appears that this worked only because lat
and lon
were treated in another way. I am not yet sure how to fix this.
Thanks @kvantricht ! I will take a look
@gabrieltseng whenever we make this fix, it's very important to crosscheck compatibility with https://github.com/WorldCereal/presto-worldcereal/blob/9aed8f4a0d9f1de97fb1e4594110a3a5b1c5ff3c/presto/dataset.py#L332
We're suffering already for 3 days in the inference pipeline figuring out what's going on and one of the issues we face is inconsistency between the lat/lon values and the EO data values, because np.swapaxes
works differently from np.transpose
. It's crucial that throughout all inputs the flattening of the images has been done consistently.
@gabrieltseng i'm working on this one, because it's the cause of failing tests in my updated inference dataset. What I'm wondering: why in combine_predictions
are we transforming back to Pandas dataframe? It's a mess now that lat/lon are back in good shape.
@gabrieltseng i'm working on this one, because it's the cause of failing tests in my updated inference dataset. What I'm wondering: why in
combine_predictions
are we transforming back to Pandas dataframe? It's a mess now that lat/lon are back in good shape.
when doing a search for when this combine_predictions
is actually used, I can only find it in tests, where still the dataframe is again reassembled to an xarray.DataArray(). So in reality, the DataFrame seems never to be used by itself. So I'm going to try for now to avoid that step which is going to be easier.
@gabrieltseng i'm working on this one, because it's the cause of failing tests in my updated inference dataset. What I'm wondering: why in
combine_predictions
are we transforming back to Pandas dataframe? It's a mess now that lat/lon are back in good shape.when doing a search for when this
combine_predictions
is actually used, I can only find it in tests, where still the dataframe is again reassembled to an xarray.DataArray(). So in reality, the DataFrame seems never to be used by itself. So I'm going to try for now to avoid that step which is going to be easier.
The only test that in that case no longer makes a lot of sense as it is now is this one: https://github.com/WorldCereal/presto-worldcereal/blob/main/tests/test_dataset.py#L69
This test was taken from openmapflow. Also here I don't fully get what kind of inference is supposed to be tested here. If it's a true flattened 2D inference test, and if flat_lat
and flat_lon
are size 5, you'd in fact need 25 predictions belonging to the grid constructed by flat_lat
and flat_lon
by using a meshgrid
.
I've already adapted comlbine_predictions
to work with the gridded data as xr.DataArray
so it no longer returns a DataFrame. The other tests are working. The above would need to be adapted to test the DataArray instead of a derived DataFrame.
What I'd need from @gabrieltseng is some insights into whether or not I'm overlooking something on the need of working with a dataframe instead. Will be pushing some changes to the branch to make the discussion more concrete.
So the proposed changes including the fixes of the test are in another PR, addressed specifically in 817da45b0acad262883fb001722c5a053ff25600
Fix has been applied as part of https://github.com/WorldCereal/presto-worldcereal/pull/103. No more need for this PR.
@gabrieltseng the inference dataset had a bug in latlon handling. Only "worked" for square patches but even then contained a bug. This should go through a meshgrid. Made a fix for this. I'm not really familiar with
einops.rearrange
so please check if I did this the right way. One approved, we should be ok to move this right intomain
.