NOAA-OWP / lstm

Other
12 stars 17 forks source link

Set_value functionality is mostly commented out. #43

Closed jmframe closed 3 weeks ago

jmframe commented 6 months ago

Short description explaining the high-level reason for the new issue.

Current behavior

The set_value function, starting on line 792, copies the input to an internal array, down to line 805. So far, so good. But, then the rest of the code to actually set the value is commented out, so the current code cannot run input values.

Expected behavior

There should be some mechanism to get the input value set into the model, which is in the commented out code below from lines 807-830. Looks like there has been some discussion on the correct mechanisms to get the input value set, but any consensus may be obscured by the commented out code.

Steps to replicate behavior (include URLs)

  1. Uncomment lines 807-830.
  2. Replace the item "value" (which doesn't exist) from lines 812-830 with the "internal_array" variable.

Screenshots

LSTM_Set_value_issue_20240413

SnowHydrology commented 6 months ago

@jmframe, I am no Python expert, but the current implementation of set_value is close to what's prescribed by CSDMS, which is more parsimonious than the commented lines. See my functioning snowBMI code here (based on the CSDMS heat example).

In other words, it's worth a closer look.

jmframe commented 6 months ago

Thanks @SnowHydrology, the current code, as it runs without the commented out section, does not work. "internal_array" is a dead end. At some point the incoming values need to be stored somewhere within the model class. I'm not picky on how that happens, but it does HAVE to happen somehow. Below is a screenshot with the two implementations, left is current, as you can see there is no response from the LSTM, because there are no input values to the model being set. Right is with the input values from "internal_array" saved in the model class. Although maybe I am missing something?

image

`
def set_value(self, var_name: str, values:np.ndarray):

    internal_array = self.get_value_ptr(var_name)
    #self.get_value_ptr(var_name)[:] = values
    internal_array[:] = values

    short_name = self._var_name_map_long_first[ var_name ]

    # Better approach, assuming type is "ndarray" (SDP)
    # setattr( self, short_name, values )

    if (internal_array.ndim > 0):
        setattr( self, short_name, internal_array[0])
    else:
        setattr( self, short_name, internal_array )

    try: 
        #NJF From NGEN, `value` is a singleton array
        setattr( self, var_name, internal_array[0] )

        # jmframe: this next line is basically a duplicate. 
        # I guess we should stick with the attribute names instead of a dictionary approach. 
        self._values[var_name] = internal_array[0]
    # JLG 03242022: this isn't really an "error" block as standalone considers value as scalar?
    except TypeError:
        setattr( self, var_name, internal_array )

        # jmframe: this next line is basically a duplicate. 
        # I guess we should stick with the attribute names instead of a dictionary approach. 
        self._values[var_name] = internal_array

`

SnowHydrology commented 6 months ago

@jmframe, I should have made my answer less parsimonious. A better way to say what I was thinking is that, not being a Python expert, I don't know why the snowBMI and heat set_value implementations work and the LSTM implementation does not. If you see the snowBMI example screenshot below, it does appear that set_value works as expected despite relying two lines of code that are similar to the two lines in the LSTM set_value function.

image

Do you know why this works:

val = self.get_value_ptr(var_name)
val[:] = src.reshape(val.shape)

but this does not?:

internal_array = self.get_value_ptr(var_name)
internal_array[:] = values
jmframe commented 6 months ago

Yeah, this is interesting. I am taking a look now...

jmframe commented 6 months ago

I had to ask ChatGPT, and here is the explanation:

Object References:
    When val = self.get_value_ptr(var_name) is called, val is assigned a reference to the numpy array stored in the _values dictionary for the specified var_name. It does not create a new array or data structure but merely points to the existing array already managed by the SnowBmi class.

In-place Modification:
    The operation val[:] = src.reshape(val.shape) modifies the contents of the array val references in place. This means that the original data in the array is overwritten with new data from src. Since val is a reference to the array stored in the _values dictionary, this modification affects the array directly in the dictionary.

Persistence of Changes:
    Because val directly references the array in the _values dictionary and the changes are made in-place to this array, the modifications persist beyond the scope of the set_value method. There is no need to return val or reassign it back to the dictionary because the dictionary already contains a reference to the array that has been modified.

Memory and Reference Management:
    In Python, when you deal with mutable objects like lists, dictionaries, or numpy arrays, any changes made to the object through one reference are visible through all other references to the same object. Therefore, even though val is a local variable within the scope of the set_value method, it refers to a mutable object that exists in the broader scope of the class (self._values dictionary).

No Need to Attach val to self:
    There's no need to attach val to the class instance (self) because it's not a new or separate entity that needs persistence on its own; it's merely a conduit to access and modify the already persistent data managed by the class.

This is a common pattern in Python for managing and manipulating shared, mutable data structures: modifications made through any reference to a mutable object affect the underlying object, and thus, all references to this object will reflect these changes. This feature is particularly useful in data-heavy applications where passing around copies of large datasets would be inefficient both in terms of memory and computation.

jmframe commented 6 months ago

Okay, I think it comes down to this initialization of the _values, from Snow_bmi:

    self._values = {"atmosphere_water__precipitation_leq-volume_flux": self._model.ppt_mm,
                    "land_surface_air__temperature": self._model.tair_c,
                    "snowpack__liquid-equivalent_depth": self._model.swe_mm,
                    "snowpack__melt_volume_flux": self._model.melt_mm}

We need to do the same thing with the lstm_bmi. Currently, _values are set to an integer, and we need to change them to be set to be an element of the self.

jmframe commented 6 months ago

@SnowHydrology, I think going with the heat model example functionality is going to get pretty messy. Because for the LSTM, we need to pass a torch.tensor. In the heat, and your snow, examples, you are setting the values directly to _model, which then uses those values directly. For the LSTM, we pass in the input_tensor:

self.lstm_output, self.h_t, self.c_t = self.lstm.forward(self.input_tensor, self.h_t, self.c_t)

So, what we would have to do to follow the heat example is change the values in a self.lstm._input_tensor, which I guess is possible. It is going to take quite a bit of work to make this happen, but certainly doable. I'll try to get that settled.

SnowHydrology commented 6 months ago

Thanks for this info @jmframe. It was helpful for me to understand.

Do you think it's worth applying a different implementation of set_value than what is in the heat example to be more easily compatible with LSTM while still being BMI compliant?

jmframe commented 6 months ago

I'm not sure what the best path forward is. I think I need to dig a bit more into what is actually required from these BMI wrappers in order to be able to run in NextGen. I'm going to try getting the this set_value function working, as a starting point. My guess is that most BMI functions are similarly not done in the same way that is expected.

jmframe commented 6 months ago

@SnowHydrology,

I'm starting to think that it just doesn't make sense to modify the LSTM model code for the sake of shortening set_value. All we would do is move all the code, including the "create_scaled_input_tensor" function, to the LSTM model formulation code. I don't particularly think that all the data pre-processing should be moved to the model formulation. To me, this is much better handled by the BMI.

Or, if it is necessary to clean up the BMI code, then I think what I would do is have an interface between the BMI and the model. I know that sounds ridiculous, but it might end up being quite clean. So we can have a clean BMI going into a messy data/model/run interface going into a clean model. This is kind of what might be happening when I get the whole NeuralHydrology package wrapped in BMI.

SnowHydrology commented 6 months ago

I see what you're saying. BMI2, the Basic Model Interface Interface?

image

hellkite500 commented 6 months ago

I'll concur that set_value currently doesn't work to actually set any persistent value. It looks like this code has evolved a back and forth using various mechanics for holding these states, but 91ad64c37f2318eb6719a658e8370639d6bddc12 looks to have removed the functionality to save the new values in BMI object's attribute list.

If a backing model, such as the LSTM, stores data in a compatible way, I think aliasing directly to that memory is advisable as an input/output variable that can be get/set via BMI. If there is some incompatibility between the data provided to set_value or required to return by get_value, then the BMI object can hold intermediate states and proxy the transformations as required.

SnowHydrology commented 6 months ago

@hellkite500 and @jmframe, can it be as easy as just modifying the get_value and set_value functions to convert the passed numpy arrays to tensors and vice-versa? Or do we need to find a more generalizable solution, knowing that folks will likely want to implement other ML models (e.g. differentiable models)?

jmframe commented 6 months ago

@hellkite500 and @SnowHydrology the first problem I see with the LSTM is that the function is "set_value" (singular), not "set_values" (plural). We can have to take the imputed values and process them for the model. In this case, it is a simple normalization. So we can normalize each value during the set_value function, instead of what we do here, which is normalize them in the update function. Either way, this code has to happen somewhere. I'm not convinced one order of operations is better than the other.

@SnowHydrology, for differentiable models, for instance, the first step is to run an LSTM to make the predictions of the differentiable conceptual model's parameter values, so yes, finding a general solution is probably a really good idea. If we "solve" the LSTM issue, then that will likely transfer directly to any other differentiable, tensor-using, model.

hellkite500 commented 6 months ago

So we can normalize each value during the set_value function, instead of what we do here, which is normalize them in the update function

If the model requires several input values to be changed before it can properly update, then doing those steps as part of the update function makes sense. The expectation from BMI is that all input values are set prior to a call to update.

If the input has more than one "related" value, then it should probably be described as a non scalar type, or a 1D scalar where multiple values are a single conceptual input.

I really don't think the solution here is one of complexity, but rather of simplifying and reducing complexity with the help of perhaps a little bit of abstraction. Happy to put together some code to test out and iterate on, but I would want to make sure I'm understanding the LSTM input/update steps and requirements appropriately.

jmframe commented 6 months ago

@hellkite500 sounds good! Let me know if you want to chat about it. Would be a good idea to think of this in the context of your bmi_pytorch code. I'm down to try to get this right, I just want to know what IS right. For now, I think I'll make that intermediate step between BMI and the LSTM, the BMI interface, if you will, then we can go from there.

SnowHydrology commented 6 months ago

@jmframe, you want to take a crack at this and then @madMatchstick, @hellkite500, and I can review the PR? That can be a nice test of informal governance of community contributions.

hellkite500 commented 6 months ago

@SnowHydrology The reason this works

val = self.get_value_ptr(var_name)
val[:] = src.reshape(val.shape)

and this doesn't

internal_array = self.get_value_ptr(var_name)
internal_array[:] = values

Is most likely due to the shape of the array. In the first block, the data coming in in values is a flattened array, and it is being reshaped to the correct destination array dimensions (var) and then each element of the reshaped array is copied to the memory for each element that val references (see @jmframe 's GPT explanation which explains what [:] is doing in this case -- an element wise copy.) In the second block, if internal_array isn't a 1D array of the same size as values you are going to get an error trying to do that element wise copy.

jmframe commented 3 weeks ago

thanks for the fixes @madMatchstick and @aaraney . I'm closing this issue.