OceanParcels / Parcels

Main code for Parcels (Probably A Really Computationally Efficient Lagrangian Simulator)
https://www.oceanparcels.org
MIT License
295 stars 136 forks source link

Terrain following & consistency #781

Closed miquelmassot closed 4 years ago

miquelmassot commented 4 years ago

Dear maintainers,

we are using OceanParcels to follow the seafloor bathymetry at a constant altitude by feeding two netCDF files, one from Copernicus (global-analysis-forecast-phy-001-024) and one bathymetry file (gebco or similar) into a fieldset:

    filenames = {'U': config['OceanCurrent_filenames'],
                 'V': config['OceanCurrent_filenames'],
                 'Alt': config['Bathymetry_filenames']}
    variables = {'U': 'uo',
                 'V': 'vo',
                 'Alt': 'elevation'}
    dimensions = {'U': {'lat': 'latitude', 'lon': 'longitude', 'depth': 'depth', 'time': 'time'},
                  'V': {'lat': 'latitude', 'lon': 'longitude', 'depth': 'depth', 'time': 'time'},
                  'Alt': {'lat': 'lat', 'lon': 'lon'}}
    fieldset = FieldSet.from_netcdf(filenames, variables, dimensions,
                                    deferred_load=True, field_chunksize='auto')

And with this, we are able to modify our subclass of JITParticles to the desired depth, whilst storing at the same time the u and v speeds interpolated at current (lat, lon, depth, time). We used altitude values from 4 to 40 meters with the same results.

# Define the new Kernel that mimics DriftCam vertical movement
def TerrainFollowingVerticalMovement(particle, fieldset, time):
    particle.depth = -fieldset.Alt[0, 0, particle.lat, particle.lon] + fieldset.altitude
    particle.velocity_U = fieldset.U[
        time, particle.depth, particle.lat, particle.lon]
    particle.velocity_V = fieldset.V[
        time, particle.depth, particle.lat, particle.lon]

to our surprise, when running this model backwards / forwards from a starting point, the trajectory and the velocities do not match. We are using the same netCDF files and the same timestamps/timesteps. We are starting to wonder if it is related to how interpolation is made or on how does current extrapolation/interpolation affect when close to the seafloor.

1 - LatLon 2 - U Speed 3 - V speed 4 - Depth

When fixing the depth (500m) in the simulation, both forwards and backwards simulation agree.

5 - Agree at constant depth

We believe that this has to do with how are seafloor current extrapolated / read. By looking at the netCDF files we see that there is a gap of about 200m from the last current measurement to the seafloor. This mean that when we interpolate the seafloor current we are extrapolating. Is there something we can do from our side? How can we fix it to the last available current? Could you explain a bit more how does this inter/extrapolation work?

Kind regards and cheers for this great piece of software.

erikvansebille commented 4 years ago

Hi @miquelmassot, thanks for reporting this! With which version are you working now? The latest from anaconda? Or are you using a more recent version from github?

Because if the latter is the case, it might be that you stumbled upon the same bug that we just discovered today and which we've described in #782.

In short: there is a problem with how the most recent version of Parcels handles FieldSets that contain Fields on different Grids (which in your case is the case, since Alt is a 2D Field, and U and V are 4D).

So if you are indeed working on the latest master, could you check out v2.1.4 and run with that? I'd be keen to know if that fixes the problem!

miquelmassot commented 4 years ago

Hi @erikvansebille, the results I uploaded were done with v2.1.4, and not the master branch.

miquelmassot commented 4 years ago

I have just tried the master branch at

$ git describe --tags
v2.1.4-120-gb093a46

and same results. BTW, the code execution was much slower. Is there any test I can make?

erikvansebille commented 4 years ago

Hi @miquelmassot, thanks for trying with the master branch. That it is much slower is probably because of #782. Could you also try running with the latest soa_multigrid_indexing branch, where this bug is now fixed? You should not see the slowdown in speed with that branch!

As for your original Issue, could you send me the full code? Because to be honest I don't really understand what you mean with 'half in forward mode' and 'half in backward mode'. From where do you start then? It's known that due to accumulation of small errors, a particle in simulation forward for some time DeltaT and then backwards again to time 0 will never return to exactly the same point; that is true for every RK4 process. Is that what you're asking?

miquelmassot commented 4 years ago

That is indeed a possibility. I have given you access to a private repository. https://github.com/miquelmassot/driftcam_planner/ Let me roughly explain what we are doing: We are trying to predict where to deploy an asset in the ocean. This asset follows 4 phases:

  1. Is deployed at a point in the surface and sinks up to a defined altitude to the seafloor
  2. Drifts with underwater currents whilst controlling its altitude
  3. After a predefined number of days, returns back to the surface
  4. Drifts on the surface waiting for someone to pick it up and download the data.

We coded this with ocean parcels by giving the software a "middle point" we want to pass by and running a simulation in reverse for half of the time. Then we find the surface time and (lat, lon). Then from that time and point we run the simulation forwards, and compare:

We would expect some divergences, but we are currently obtaining differences of kilometres for a 30 day transect. Furthermore, the data in the graph show that for practically the same lat, lon and time the seafloor currents change drastically, making the three legs completely different. Look for instance the V velocity in the plots I uploaded. In comparison, when we keep depth constant instead, all numbers agree. Does it have something to do with interpolation? To me, it does not look like integration errors, as they would show in both. What do you think?

erikvansebille commented 4 years ago

Thank you for this explanation. This helps a lot in understanding your setup. Your argument that the simulation where the depth is fixed does not show such large differences indeed points to something being 'strange' in the terrain-following simulation.

This is not easy to bug-fix though; especially because there isn't an obvious 'error' or something that we can hone into.

By the way, looking at your original plots again, I see some strange collection of purple points on the left side of the graph. See the red circle in the below figure:

Screen Shot 2020-04-02 at 14 55 05

Might this not explain the difference? Do you know what's going on there?

miquelmassot commented 4 years ago

Thanks @erikvansebille. This goes inline with #779. The reverse simulation continues until the runtime is reached. Have a look at the fixed depth example too Fixed depth: https://github.com/miquelmassot/driftcam_planner/blob/master/Example%202%20-%20Fixed%20depth.ipynb and see that the differences between U/V currents and therefore different lat/lon are minimal. Do you think it has to be related to seafloor current extrapolation? When altitude is fixed is when this differences occur. Fixed altitude: https://github.com/miquelmassot/driftcam_planner/blob/master/Example%201%20-%20Fixed%20altitude.ipynb

erikvansebille commented 4 years ago

Yes, it could indeed be related to the interpolation. Although I don't really see yet what would then be wrong in the interpolation. Because the errors are not enormous (particle don't start flying all around), I'll have to think exactly how to approach debugging this...

By the way #779 is now implemented and we should merge it into master within the coming hour or so. So then perhaps you can rerun with the return ErrorCode.StopExecution in your Kernel?

erikvansebille commented 4 years ago

Can I ask you to try three different experiments, to hone in on what could be causing this?

  1. Run the same simulation but without time-varying fields. The easiest way to do this is to only load the first snapshot of the Fields, and set allow_time_extrapolation=True when you create your FieldSet. This helps us to investigate if the time interpolation is the problem
  2. Rn the experiment for a shorter period of time; i.e. only when the particle is above the ridge. So not with the sink and rise phase. This helps us to investigate if it really is the part on the ridge that is causing the difference.
  3. A combination of 1) and 2), so both no time variability and a shorter simulation
miquelmassot commented 4 years ago

Thanks @erikvansebille, I'll prepare these experiments and get back to you.

miquelmassot commented 4 years ago

I tried loading the fieldset with just one Time slice with the following, but I do not see any difference.

indices_extract = {}
indices_extract['time'] = [0]

fieldset = FieldSet.from_netcdf(filenames, variables, dimensions,
                                deferred_load=True, field_chunksize='auto',
                                indices=indices_extract,
                                allow_time_extrapolation=True)

Could you please point me to the correct way of using just one snapshot?

erikvansebille commented 4 years ago

Could you please point me to the correct way of using just one snapshot?

Normally, I'd give FieldSet creation only one netcdf file, assuming that each snapshot is in a separate netcdf files. If there are multiple times in one netcdf file, perhaps remove time from the dimensions dictionary? Does that work?

miquelmassot commented 4 years ago

I tried removing time from the dimensions dict and there is an assertion error:

AssertionError: Field U expecting a data shape of a [tdim, zdim, ydim, xdim]. Flag transpose=True could help to reorder the data.

I also tried with the flag transpose, but triggered the same assertion.

miquelmassot commented 4 years ago

Tried as well the following (with time in the dimensions dictionary)

timestamps = [[0]]
fieldset = FieldSet.from_netcdf(filenames, variables, dimensions, 
                                deferred_load=True, field_chunksize='auto',
                                timestamps=timestamps,
                                allow_time_extrapolation=True)

and found the same shape assertion

erikvansebille commented 4 years ago

Here's a neat workaround that may work: use Parcels' capability to directly read xarray Datasets. So open the netcdf files as xarray dataset, and then select only the first Time slice. Do not forget to also remove time from the dictionary

Something like

ds = xr.open_mfdataset([filenames['U'], filenames['V'], filenames['Alt'], combine='by_coords')
ds = ds.isel(Time=0)
fieldset = FieldSet.from_xarray_dataset(ds, variables, dimensions, allow_time_extrapolation=True)
erikvansebille commented 4 years ago

I see you've opened a new Issue at #830, @miquelmassot. Has this problem been solved now? In that case, I propose to close this Issue, feel free to reopen if there's still work to do

miquelmassot commented 4 years ago

I am afraid it is not yet solved. The issue for me is that I am using two netCDF files (bathymetry and ocean currents) gridded at different intervals. I tried loading into an xarray, but the combination by_coords did not work. I have not had the time to look at it yet. Is there any other workaround you can provide to test it?