OpenDrift / opendrift

Open source framework for ocean trajectory modelling
https://opendrift.github.io
GNU General Public License v2.0
250 stars 120 forks source link

Defaults are overrinden inside LagrangianArray.__init__() #1429

Open lencart opened 4 weeks ago

lencart commented 4 weeks ago

Framing

The loading of the variable defaults in LagrangianArray.__init__()(elements.py line 139) give the LagrangianArray object the properties with the correct values. Right after that (line 141) these are overriden by the values supplied by__init__(kwargs) . The problem is that all variables seem to be passed in kwargs with empty lists as value.

This matters because in line 141 there is an attempt to generate empty numpy arrays just by calling np.[dtype] = [], that works if [dtype] is a native numpy type such as numpy.float32, int or float. This doesn't work for types that are usually stored as numpy.dtype('O').

In line 139 the defaults are loaded from the particle class of the respective model

##### AFTER Line 139 #####
ID -1
##### AFTER Line 139 #####
status 0
##### AFTER Line 139 #####
moving 1
##### AFTER Line 139 #####
age_seconds 0.0
##### AFTER Line 139 #####
origin_marker 0
##### AFTER Line 139 #####
z 0.0
##### AFTER Line 139 #####
wind_drift_factor 0.02
##### AFTER Line 139 #####
current_drift_factor 1.0
##### AFTER Line 139 #####
terminal_velocity 0.0
##### AFTER Line 139 #####
neutral_buoyancy_salinity 31.25
##### AFTER Line 139 #####
diameter nan

But after that, looping through kwargs overrides it

##### AFTER Line 141 #####
kwargs['ID'] ==[], of type <class 'list'>
ID []
##### AFTER Line 141 #####
kwargs['status'] ==[], of type <class 'list'>
status []
##### AFTER Line 141 #####
kwargs['moving'] ==[], of type <class 'list'>
moving []
##### AFTER Line 141 #####
kwargs['age_seconds'] ==[], of type <class 'list'>
age_seconds []
##### AFTER Line 141 #####
kwargs['origin_marker'] ==[], of type <class 'list'>
origin_marker []
##### AFTER Line 141 #####
kwargs['lon'] ==[], of type <class 'list'>
lon []
##### AFTER Line 141 #####
kwargs['lat'] ==[], of type <class 'list'>
lat []
##### AFTER Line 141 #####
kwargs['z'] ==[], of type <class 'list'>
z []
##### AFTER Line 141 #####
kwargs['wind_drift_factor'] ==[], of type <class 'list'>
wind_drift_factor []
##### AFTER Line 141 #####
kwargs['current_drift_factor'] ==[], of type <class 'list'>
current_drift_factor []
##### AFTER Line 141 #####
kwargs['terminal_velocity'] ==[], of type <class 'list'>
terminal_velocity []
##### AFTER Line 141 #####
kwargs['neutral_buoyancy_salinity'] ==[], of type <class 'list'>
neutral_buoyancy_salinity []
##### AFTER Line 141 #####
kwargs['diameter'] ==[], of type <class 'list'>
diameter []

This causes the elements to loose the default values after seeding

Below you can see that all elements are empty of their default values with the exception of diameter that in this case is set explicitly inside this particular model's __init__() method.

ID: []
status: []
moving: []
age_seconds: []
origin_marker: []
lon: []
lat: []
z: []
wind_drift_factor: []
current_drift_factor: []
terminal_velocity: []
neutral_buoyancy_salinity: []
diameter: 0.00014  #This one is set with config

This however is magically recovered when the model starts the run

Results for 2 particles after the model has finished show default values for the properties that didn't change during the run:

In [4]: o.elements
Out[4]: 
ID: [1 2]
status: [0. 0.]
moving: [1. 1.]
age_seconds: [3600. 3600.]
origin_marker: [0. 0.]
lon: [-5.65913965 -5.65913965]
lat: [54.71570588 54.71570588]
z: [-0.0108496 -0.0108496]
wind_drift_factor: [0.02 0.02]
current_drift_factor: [1. 1.]
terminal_velocity: [0. 0.]
neutral_buoyancy_salinity: [31.25 31.25]
diameter: [0.00014044 0.00014044]

Possible remedy that didn't work

One way around that could be bypassing the loading of kwargs if the property already existed and had a default value:

@@ -138,8 +138,13 @@ class LagrangianArray:
         for default_variable in default_values.keys():  # set default values
             setattr(self, default_variable, default_values[default_variable])
         for input_variable in kwargs.keys():  # override with input values
+            if hasattr(self, input_variable) and kwargs[input_variable] == []:
+                continue
             setattr(self, input_variable, self.variables[input_variable]
                     ['dtype'](kwargs[input_variable]))

That works for model creation and seeding, preserving the defaults. However, it messes up environment when the run starts, as follows.

Base conditions

For a run without the change, at preparation, these are the shapes of the environment and the contents of the missing_indices array that is going to be passed to the environment variables :

14:24:50 DEBUG   opendrift.models.basemodel.environment:607: ----------------------------------------
14:24:50 DEBUG   opendrift.models.basemodel.environment:608: Variable group ['land_binary_mask']
14:24:50 DEBUG   opendrift.models.basemodel.environment:609: ----------------------------------------
14:24:50 DEBUG   opendrift.models.basemodel.environment:613: Calling reader global_landmask
14:24:50 DEBUG   opendrift.models.basemodel.environment:614: ----------------------------------------
14:24:50 DEBUG   opendrift.models.basemodel.environment:630: Data needed for 2 elements
################
missing_indices [0 1]
#### SHAPES ####
lon:  (2,)
lat  (2,)
z:  (2,)

And at the 1st (and subsequential) time steps:

14:24:50 DEBUG   opendrift.models.basemodel.environment:784: Finished processing all variable groups
14:24:50 DEBUG   opendrift.models.basemodel.environment:909: ------------ SUMMARY -------------
14:24:50 DEBUG   opendrift.models.basemodel.environment:911:     land_binary_mask: 0 (min) 0 (max)
14:24:50 DEBUG   opendrift.models.basemodel.environment:913: ---------------------------------
14:24:50 INFO    opendrift.models.basemodel:947: All points are in ocean
14:24:50 DEBUG   opendrift.models.basemodel:891: to be seeded: 2, already seeded 0
14:24:50 DEBUG   opendrift.models.basemodel:909: Released 2 new elements.
14:24:50 WARNING opendrift.models.basemodel:730: Seafloor check not being run because environment is missing. This will happen the first time the function is run but if it happens subsequently there is probably a problem.
14:24:50 DEBUG   opendrift.models.basemodel:2037: ======================================================================
14:24:50 INFO    opendrift.models.basemodel:2038: 2024-10-23 01:00:00 - step 1 of 2 - 2 active elements (0 deactivated)
14:24:50 DEBUG   opendrift.models.basemodel:2044: 0 elements scheduled.
14:24:50 DEBUG   opendrift.models.basemodel:2046: ======================================================================
14:24:50 DEBUG   opendrift.models.basemodel:2055:       latitude =  54.7181
14:24:50 DEBUG   opendrift.models.basemodel:2060:       longitude = -5.6536
14:24:50 DEBUG   opendrift.models.basemodel:2065:       z = 0.0
14:24:50 DEBUG   opendrift.models.basemodel:2068: ---------------------------------
14:24:50 DEBUG   opendrift.models.basemodel.environment:607: ----------------------------------------
14:24:50 DEBUG   opendrift.models.basemodel.environment:608: Variable group ['sea_floor_depth_below_sea_level', 'sea_surface_height', 'x_sea_water_velocity', 'y_sea_water_velocity', 'sea_water_temperature', 'sea_water_salinity', 'surface_downward_x_stress', 'surface_downward_y_stress']
14:24:50 DEBUG   opendrift.models.basemodel.environment:609: ----------------------------------------
14:24:50 DEBUG   opendrift.models.basemodel.environment:613: Calling reader roms native
14:24:50 DEBUG   opendrift.models.basemodel.environment:614: ----------------------------------------
14:24:50 DEBUG   opendrift.models.basemodel.environment:630: Data needed for 2 elements
################
missing_indices [0 1]
#### SHAPES ####
lon:  (2,)
lat  (2,)
z:  (2,)

With the kwargs check bypassed for existing properties

At preparation the shapes and debug log is exactly like the base case but during the 1st time step the shapes go wrong:

4:22:00 DEBUG   opendrift.models.basemodel.environment:784: Finished processing all variable groups
14:22:00 DEBUG   opendrift.models.basemodel.environment:909: ------------ SUMMARY -------------
14:22:00 DEBUG   opendrift.models.basemodel.environment:911:     land_binary_mask: 0 (min) 0 (max)
14:22:00 DEBUG   opendrift.models.basemodel.environment:913: ---------------------------------
14:22:00 INFO    opendrift.models.basemodel:947: All points are in ocean
14:22:00 DEBUG   opendrift.models.basemodel:891: to be seeded: 2, already seeded 2
14:22:00 DEBUG   opendrift.models.basemodel:909: Released 2 new elements.
14:22:00 WARNING opendrift.models.basemodel:730: Seafloor check not being run because environment is missing. This will happen the first time the function is run but if it happens subsequently there is probably a problem.
14:22:00 DEBUG   opendrift.models.basemodel:2037: ======================================================================
14:22:00 INFO    opendrift.models.basemodel:2038: 2024-10-23 01:00:00 - step 1 of 2 - 3 active elements (1 deactivated)
14:22:00 DEBUG   opendrift.models.basemodel:2044: 1 elements scheduled.
14:22:00 DEBUG   opendrift.models.basemodel:2046: ======================================================================
14:22:00 DEBUG   opendrift.models.basemodel:2055:       latitude =  54.718101501464844
14:22:00 DEBUG   opendrift.models.basemodel:2060:       longitude = -5.653600215911865
14:22:00 DEBUG   opendrift.models.basemodel:2065:       z = 0.0
14:22:00 DEBUG   opendrift.models.basemodel:2068: ---------------------------------
14:22:00 DEBUG   opendrift.models.basemodel.environment:607: ----------------------------------------
14:22:00 DEBUG   opendrift.models.basemodel.environment:608: Variable group ['sea_floor_depth_below_sea_level', 'sea_surface_height', 'x_sea_water_velocity', 'y_sea_water_velocity', 'sea_water_temperature', 'sea_water_salinity', 'surface_downward_x_stress', 'surface_downward_y_stress']
14:22:00 DEBUG   opendrift.models.basemodel.environment:609: ----------------------------------------
14:22:00 DEBUG   opendrift.models.basemodel.environment:613: Calling reader roms native
14:22:00 DEBUG   opendrift.models.basemodel.environment:614: ----------------------------------------
14:22:00 DEBUG   opendrift.models.basemodel.environment:630: Data needed for 2 elements
################
missing_indices [0 1]
#### SHAPES ####
lon:  (2,)
lat  (2,)
z:  ()
14:22:00 INFO    opendrift.models.basemodel.environment:671: ========================
14:22:00 ERROR   opendrift.models.basemodel.environment:672: invalid index to scalar variable.
Traceback (most recent call last):
  File "/proj/projects/opendrift-dev/code/opendrift/models/basemodel/environment.py", line 651, in get_environment
    z=z[missing_indices], rotate_to_proj=self.proj_latlon)
IndexError: invalid index to scalar variable.

In addition to the 0-dimension z variable there is also 3 instead of 2 active elements (one deactivated), when I seeded just 2 (as is correct in the base case).

Bottom line

In the current state, only LagrangeanArray variables can only have numpy-native types. In my case, the use of arrays of the Enum type (resulting in a object dtype array) is an optimal solution since I need the particle property to have an immutable choice of semantically-correct values.

I can use a workaround but I think the strange dependency between preloading defaults and environment initialization benefits from clarifying and separation.