Open jsadler2 opened 3 years ago
Hmm.. this is during prediction or training? would this help:
model.rnn_layer.build(input_shape=x_data.shape)
I think right after this line.
This was during training.
I'll give that a try
So I added
self.rnn_layer.build((42, 365, 2))
right above this line https://github.com/USGS-R/river-dl/blob/ec2d9b97f0e333cb81cb579a8318fc2d69aaad92/river_dl/rnns.py#L30
And I got a new error:
RuleException:
TypeError in line 79 of /mnt/d/onedrive/OneDrive - DOI/research/drb/river-dl/Snakefile:
in user code:
/home/jsadler/miniconda3/envs/rgcn1/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:805 train_function *
return step_function(self, iterator)
/mnt/d/onedrive/OneDrive - DOI/research/drb/river-dl/river_dl/rnns.py:41 call *
self.rnn_layer.reset_states(states=[h_init, c_init])
/home/jsadler/miniconda3/envs/rgcn1/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py:961 reset_states **
K.batch_set_value(set_value_tuples)
/home/jsadler/miniconda3/envs/rgcn1/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
return target(*args, **kwargs)
/home/jsadler/miniconda3/envs/rgcn1/lib/python3.6/site-packages/tensorflow/python/keras/backend.py:3706 batch_set_value
x.assign(np.asarray(value, dtype=dtype(x)))
/home/jsadler/miniconda3/envs/rgcn1/lib/python3.6/site-packages/numpy/core/_asarray.py:83 asarray
return array(a, dtype, copy=False, order=order)
TypeError: __array__() takes 1 positional argument but 2 were given
Hey all, I'm working on integrating river-dl
more with the reservoir forecasting repos, and I was curious if any more progress was made on this - experiencing the initial error myself
Lots of references that I find regarding this solve the problem by simply passing a batch_input_shape argument to your first layer
as suggested by the comment, but I haven't found a way to implement that within this approach/syntax. Im fairly new to tensorflow (but very familiar with ML/DL and implementation via PyTorch), but it appears that there are 3 general ways to write this model code and our way (model subclassing) isn't as well documented (e.g., the error code suggestion deals with the other 2).
When I find a place to specify batch_input_shape
or batch_shape
(e.g., at layers.LSTM()
, model.build()
or model.fit()
), the arguments seem to go unrecognized or explicitly raise an error of unknown keyword 😧
Thanks for looking into this, @jdiaz4302. I don't think we ever did find a solution to this. I've found a similar difficulty in finding solutions for other issues because we have used the "model subclassing" approach. From my understanding, I don't think we can use the other two approaches (sequential api and functional api) because they allow for much less customization.
For this issue in particular, I think it has something to do with specifying the batch size in a build statement, like you mentioned. I think I got a little farther at doing this here. Are you able to get this error?
That's awesome that you have a lot of experience with pytorch. We have casually wondered if using pytorch instead of TF would be better given that it seems to be more popular in the research community. Because we have invested in this TF code pretty heavily and since we didn't have any pytorch experience amongst us, we haven't thought very seriously about it. But now that you've joined, I wonder if it'd be worth it to think about it again.
Yeah, that's one of the (temporary?) dead-ends that I've found so far - don't think that I've gotten to that bottom of what it means though
Have you tried adding
model(x_trn_pre)
right above this line and
model(x_trn_obs)
right above this line . Will also need to move these lines above the model.compile()
To clarify, I've been working at the function-level in an ipynb that simplifies the usage (based off the one you stored in run-pgdl-da
but reduced and modified with the latest river-dl
code) - that is, I'm just defining the model and trying to use it on some randomly generated data in a bare-bones setting.
But, when I add a model(inputs)
prior to the model.compile()
it raises the original error message at the model(inputs)
line
can you share the stripped down code that you are working with?
This is my modification of Jacob's experimenting notebook that worked for some earlier version or modification of the code (linked earlier via "run-pgdl-da")
Block 1:
from __future__ import print_function, division
import tensorflow as tf
from tensorflow.keras import layers
class LSTMModel(tf.keras.Model):
def __init__(
self, hidden_size, num_tasks=1, recurrent_dropout=0, dropout=0,
):
"""
:param hidden_size: [int] the number of hidden units
:param num_tasks: [int] number of tasks (variables_to_log to be predicted)
:param recurrent_dropout: [float] value between 0 and 1 for the
probability of a recurrent element to be zero
:param dropout: [float] value between 0 and 1 for the probability of an
input element to be zero
"""
super().__init__()
self.hidden_size = hidden_size
self.num_tasks = num_tasks
self.rnn_layer = layers.LSTM(
hidden_size,
return_sequences=True,
stateful=True,
return_state=True,
recurrent_dropout=recurrent_dropout,
dropout=dropout,
)
self.dense_main = layers.Dense(1, name="dense_main")
if self.num_tasks == 2:
self.dense_aux = layers.Dense(1, name="dense_aux")
self.states = None
@tf.function
def call(self, inputs, **kwargs):
batch_size = tf.shape(inputs)[0]
h_init = kwargs.get("h_init", tf.zeros([batch_size, self.hidden_size]))
c_init = kwargs.get("c_init", tf.zeros([batch_size, self.hidden_size]))
self.rnn_layer.reset_states(states=[h_init, c_init])
x, h, c = self.rnn_layer(inputs)
self.states = h, c
if self.num_tasks == 1:
main_prediction = self.dense_main(x)
return main_prediction
elif self.num_tasks == 2:
main_prediction = self.dense_main(x)
aux_prediction = self.dense_aux(x)
return tf.concat([main_prediction, aux_prediction], axis=2)
else:
raise ValueError(
f"This model only supports 1 or 2 tasks (not {self.num_tasks})"
)
Block 2
import numpy as np
Block 3
tasks = 1
epochs = 20
batch_size = 2 # is equivalent to number of segments
time_steps = 10
n_features = 4
hidden_size = 5
return_state = True
lamb = .5
# create some fake data based on dimensions specified above
inputs = np.random.randn(batch_size, time_steps, n_features)
y_obs = np.random.randn(batch_size, time_steps, tasks)
weights = np.random.randn(batch_size, time_steps, tasks)
adj_matrix = np.random.randn(batch_size, batch_size)
# commented out from previous run-pgdl-da ex
model_lstm = LSTMModel(hidden_size=hidden_size,
#gradient_correction=False,
#tasks=tasks,
#lamb=1,
dropout=0
#grad_log_file=None,
#return_state=return_state
)
Block 4 (raises If a RNN is stateful...
error, but you can continue to Block 5 afterwards)
model_lstm(inputs)
Block 5
model_lstm.compile(optimizer=tf.optimizers.Adam(learning_rate=0.3))
Block 6
model_lstm.rnn_layer.build(inputs.shape)
Block 7 (raises the __array__() takes 1 positional argument but 2 were given
error)
model_lstm.fit(x = inputs,
y = np.concatenate([y_obs, weights], axis=2),
epochs = epochs,
batch_size = batch_size)
This might help? https://github.com/tensorflow/tensorflow/issues/46840#issuecomment-872777398 Our PIL version for the container is 8.4 and looks like downgrading to 8.2 might help
import PIL
print(PIL.__version__)
8.4.0
Their error actually doesn't replicate in my notebook (I'm using the singularity container 2.0 for run-pgdl-da
that has PIL version 8.4.0) - the code runs.
BUT, I do think looking at more generic issues in other libraries (e.g., numpy
) may be promising for this __array__() takes 1 positional argument but 2 were given
error.
Hmm. Yeah I can't reproduce that error either, but I can reproduce the error at the top of the issue thread
I think I've made some progress/findings that could lead to a fix:
reset_states
within def call
If I simply comment out the reset_states
line from the model code, no errors raise. That does, however, mean that the model is using previously calculated states (i.e., i-1) rather than starting with zeros (which seems to be your default preference).
Noticing this made me want to compare how Jake's working stateful lstm is using that method. Rather than calling reset_states
in the def call
part of the model code, the reservoir project calls reset_states
repeatedly in-workflow as needed (for data assimilation).
If I move reset_states
into the def call
part of the model code for Jake's stateful lstm, I now get the same If a RNN is stateful, it needs to know its batch size...
error. Using model.rnn_layer.build(input_shape = x.shape)
resolves the issue for Jake's stateful lstm, but (as we know) river-dl
's stateful lstm after using model.rnn_layer.build(input_shape = x.shape)
fails with the __array__() takes 1 positional argument but 2 were given
error. I haven't fixed or fully understood this yet after tinkering around with the code and various forums/google searchs, but what does clearly work is removing the following code...
batch_size = tf.shape(inputs)[0]
h_init = kwargs.get("h_init", tf.zeros([batch_size, self.hidden_size]))
c_init = kwargs.get("c_init", tf.zeros([batch_size, self.hidden_size]))
self.rnn_layer.reset_states(states=[h_init, c_init])
...from the model class and instead performing state resets before using the model to generate predictions. Example:
from __future__ import print_function, division
import tensorflow as tf
from tensorflow.keras import layers
class LSTMModel(tf.keras.Model):
def __init__(
self, hidden_size, num_tasks=1, recurrent_dropout=0, dropout=0,
):
"""
:param hidden_size: [int] the number of hidden units
:param num_tasks: [int] number of tasks (variables_to_log to be predicted)
:param recurrent_dropout: [float] value between 0 and 1 for the
probability of a recurrent element to be zero
:param dropout: [float] value between 0 and 1 for the probability of an
input element to be zero
"""
super().__init__()
self.hidden_size = hidden_size
self.num_tasks = num_tasks
self.rnn_layer = layers.LSTM(
hidden_size,
return_sequences=True,
stateful=True,
return_state=True,
recurrent_dropout=recurrent_dropout,
dropout=dropout,
)
self.dense_main = layers.Dense(1, name="dense_main")
if self.num_tasks == 2:
self.dense_aux = layers.Dense(1, name="dense_aux")
self.states = None
@tf.function
def call(self, inputs, **kwargs):
#batch_size = tf.shape(inputs)[0]
#h_init = kwargs.get("h_init", tf.zeros([batch_size, self.hidden_size]))
#c_init = kwargs.get("c_init", tf.zeros([batch_size, self.hidden_size]))
#self.rnn_layer.reset_states(states=[h_init, c_init])
x, h, c = self.rnn_layer(inputs)
self.states = h, c
if self.num_tasks == 1:
main_prediction = self.dense_main(x)
return main_prediction
elif self.num_tasks == 2:
main_prediction = self.dense_main(x)
aux_prediction = self.dense_aux(x)
return tf.concat([main_prediction, aux_prediction], axis=2)
else:
raise ValueError(
f"This model only supports 1 or 2 tasks (not {self.num_tasks})"
)
x = np.random.normal(size = (20, 10, 5))
LSTM = LSTMModel(5)
LSTM.rnn_layer.build(input_shape = x.shape)
LSTM.rnn_layer.reset_states(states = [tf.zeros([20, 5]), tf.zeros([20, 5])])
LSTM(x)
To approximate some DA situation, the above reset_states
also works with tf.random.normal
in place of tf.zeros
.
Making this kind of change would make the codebase slightly more verbose, requiring reset_states()
of varying conditions (use zeros, use i-1, or use DA), but would allow this project (and other projects using it as a library) to use stateful LSTMs.
Cool! confirmed that it works for me for resetting the states using tf.random.norml
or tf.zeros
but not when resetting using previous states, which I can't figure out.
x = np.random.normal(size = (20, 10, 5))
LSTM = LSTMModel(5)
LSTM.rnn_layer.build(input_shape = x.shape)
LSTM.rnn_layer.reset_states(states = [tf.zeros([20, 5]), tf.zeros([20, 5])])
LSTM(x)
h, c = LSTM.rnn_layer.states
h.shape
TensorShape([20, 5])
c.shape
TensorShape([20, 5])
so far so good with the h
and c
states as Tensors with [20,5] shape. But I get an error when resetting states to these h
and c
states
LSTM.rnn_layer.reset_states(states = [h, c])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_14/3444381920.py in <module>
----> 1 LSTM.rnn_layer.reset_states(states = [h, c])
/opt/venv/reticulate/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py in reset_states(self, states)
969 (batch_size, state)) + ', found shape=' + str(value.shape))
970 set_value_tuples.append((state, value))
--> 971 backend.batch_set_value(set_value_tuples)
972
973 def get_config(self):
/opt/venv/reticulate/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
204 """Call target, and fall back on dispatchers if there is a TypeError."""
205 try:
--> 206 return target(*args, **kwargs)
207 except (TypeError, ValueError):
208 # Note: convert_to_eager_tensor currently raises a ValueError, not a
/opt/venv/reticulate/lib/python3.8/site-packages/tensorflow/python/keras/backend.py in batch_set_value(tuples)
3802 if ops.executing_eagerly_outside_functions():
3803 for x, value in tuples:
-> 3804 x.assign(np.asarray(value, dtype=dtype_numpy(x)))
3805 else:
3806 with get_graph().as_default():
/opt/venv/reticulate/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
TypeError: __array__() takes 1 positional argument but 2 were given
@jzwart Good idea to test that. If I print all of those that worked and didn't work (i.e., tf.zeros([20, 5])
, tf.random.normal([20, 5])
, h
, and c
), I notice that the ones that didn't work (h
, c
) are tf.Variable
rather than tf.Tensor
. Using .value()
on the tf.Variables
seems to fix this:
x = np.random.normal(size = (20, 10, 5))
LSTM = LSTMModel(5)
LSTM.rnn_layer.build(input_shape = x.shape)
LSTM.rnn_layer.reset_states(states = [tf.zeros([20, 5]), tf.zeros([20, 5])])
LSTM(x)
h, c = LSTM.rnn_layer.states
LSTM.rnn_layer.reset_states(states = [h.value(), c.value()])
Unfortunately this fix doesn't apply to our original issue with .reset_states()
in the def call
part of the model code because the h_init
and c_init
are already being made via tf.zeros
and already tf.Tensor
(confirmed)
I'm getting this error when trying to use
LSTMModel
: