Closed saddamhijazi closed 1 year ago
Does it work using a standard Keras optimiser (eg. Adam)? Can you provide me the math equations of the loss that you have to implement? I find the code confusing, the loss that you implemented does not depend on y_true or y_pred.
Does it work using a standard Keras optimiser (eg. Adam)? Can you provide me the math equations of the loss that you have to implement? I find the code confusing, the loss that you implemented does not depend on y_true or y_pred.
Yes the code works with Adam optimizer or any Keras optimizer, the math equations which I try to solve are:
$$ u_t - 4\pi cos(2 \pi t) =0, \quad u(0) = 1 $$
The loss term does not depend on y_true or y_pred since we are solving the equations in data-free regime. I generate random points for the input variable $t$ and then I construct the operator of the math equation and finally I define the loss term which is a combination of two losses, the one coming from the ODE and the other from the initial condition. I hope this make it clearer. Thank you very much.
Hi @saddamhijazi I found a way to make it work. It is a bit tricky, because in order to reduce memory usage, my LM optimiser can call the model with a different batch size than the one set in the fit method. So I had to fix it to a proper size by setting the jacobian_max_num_rows
. I could not figure out a way to compute the gradient with respect to t for different sizes of t.
I organised the code so that you do not need to define a custom loss, but you can use the MeanSquaredError that is already implemented.
Usually LM works better in the overdetermined case (num_residuals >= num_parameters). But it is not mandatory, you can also try it in the underdetermined case.
For the following code: num_residuals = num_steps * 2 (num_steps = 1000)
So as long as your model has less than 2000 parameters you are in the overdetermined case. If you want to increase the model size then you can increase the num_steps, of course at a certain point you will run into memory issues.
In this experiment I trained a model with 321 parameters and it get's already very small training losses.
I hope it helps.
Here is the code:
import tensorflow as tf
import math as m
import levenberg_marquardt as lm
num_steps = 1000
t = tf.Variable(tf.zeros(shape=(num_steps, 1)))
class PINN(tf.keras.Model):
def __init__(self):
super().__init__()
self.model = tf.keras.Sequential([
tf.keras.layers.Dense(units=16, activation='tanh'),
tf.keras.layers.Dense(units=16, activation='tanh'),
tf.keras.layers.Dense(units=1, activation='linear')
])
def call(self, inputs, training=None, mask=None):
pi = tf.constant(m.pi)
t0 = tf.constant([[0.0]])
u_ic = self.model(t0)
t_random = tf.random.uniform(shape=(num_steps, 1), maxval = 1.5)
t.assign(t_random)
with tf.GradientTape() as tape1:
u = self.model(t)
u_t = tape1.gradient(u, t)
out0 = tf.broadcast_to(u_ic - 1.0, shape=(num_steps, 1))
out1 = u_t - 4.0 * pi * tf.math.cos(2.0 * pi * t)
k0 = 1.0 / num_steps
k1 = 10.0
out = tf.concat([k0 * out0, k1 * out1], axis=-1)
return out
def main():
x = tf.zeros((num_steps, 1)) # dummy input with one sample
y = tf.zeros(shape=(num_steps, 2))
pinn = PINN()
yt = pinn(x)
pinn.summary()
# pinn.compile(
# optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
# loss=tf.keras.losses.MeanSquaredError(), run_eagerly=False)
# pinn.fit(x, y, batch_size=num_steps, epochs=100000)
model_wrapper = lm.ModelWrapper(pinn)
model_wrapper.compile(
optimizer=tf.keras.optimizers.SGD(learning_rate=1.0),
loss=lm.MeanSquaredError(),
jacobian_max_num_rows=num_steps * 2)
model_wrapper.fit(x, y, batch_size=num_steps, epochs=100000)
if __name__ == '__main__':
main()
Hi @fabiodimarco , thank you very much for your time and for the code. I tried to add the following lines at the end of the main method in order to visualize the results:
tTest = tf.Variable(tf.linspace([0.0], [1.5], 1000))
pi = tf.constant(m.pi)
y_PINN = pinn(tTest)
y_GT = 2*tf.math.sin(2*pi*tTest) + 1
print('tTest shape is ' + str(tTest.shape))
print('y_GT shape is ' + str(y_GT.shape))
print('y_PINN shape is ' + str(y_PINN.shape))
I notice that the PINN output has 2 columns, I don't understand what is the reason of that. How one should do a simple forward test using the trained PINN, is it like what I wrote above or is it using the model_wrapper ?
I also notice that by re-running the code several times, the loss sometimes converge to low values and sometimes not. Thank you very much.
I had to use 2 columns because the LM optimiser that I implemented support only one output tensor. For that reason, you can not have a tensor with shape (1000, 1) and one with shape (1, 1). To overcome the problem I repeated the second tensor 1000 times and concatenated the two to obtain just one tensor with shape (1000, 2).
out0 = tf.broadcast_to(u_ic - 1.0, shape=(num_steps, 1))
out1 = u_t - 4.0 * pi * tf.math.cos(2.0 * pi * t)
k0 = 1.0 / num_steps
k1 = 10.0
out = tf.concat([k0 * out0, k1 * out1], axis=-1)
This is how the first and second column are computed.
out1
is like in your case (1000, 1)
out0
is also (1000, 1) but it is just a repetition of the same value 1000 times, to get a (1,1) tensor you can just do: out0[0, :]
Now that I look again at the code I noticed that the correct values for k0 and k1 should be:
k0 = tf.math.sqrt(1.0 / num_steps)
k1 = tf.math.sqrt(10.0)
In order obtain the correct scaling values when they are squared by the MeanSquaredError.
Using pinn or the model_wrapper is the same, they will return the same values. The model_wrapper inside will call pinn.
The fact that the loss sometimes converge better than other times might depends on the initial random initialisation of the network. In some cases it might get stuck into some local minimum.
Thank you very much for the explanation @fabiodimarco , now I understand better the formulation, basically when I was calling the pinn method on the input values of $t$ I was getting an output of two columns which corresponds to the residuals of the two losses. As for the PINN solution $y$ it should be obtained by calling 'model' which is a member of the class PINN, as it is done below
tTest = tf.Variable(tf.linspace([0.0], [1.5], 1000)) // Generate uniform grid points for testing the model
pi = tf.constant(m.pi)
residuals_PINN = pinn(tTest) // compute the residuals of the two losses, residulas_PINN will be [1000 X 2]
y_PINN = pinn.model(tTest) // compute the PINN approximation of the ODE at the generated points
y_GT = 2*tf.math.sin(2*pi*tTest) + 1
Are you sure that a division by num_steps
is needed for k1
? isn't that a mean operation is already done by the loss function therefore you we don't need to divide by num_steps
.
Finally you may imagine that I would like to solve more complicated equations using the PINNs and I understand that if we have more equations and more boundary conditions to be met then I must add more variables in the call function, for example assume that we have 3 equations and 10 boundary conditions then the function call must return a vector of 11 dimension correct me please if am wrong.
Thank you very much.
s for the PINN solution it should be obtained by calling 'model' which is a member of the class PINN, as it is done below
Correct.
Are you sure that a division by num_steps is needed for k1 ? isn't that a mean operation is already done by the loss function therefore you we don't need to divide by num_steps .
You are right since the loss is the mean then division by num_steps is not needed.
Original loss:
loss = out0^2 + 10 * sum(out1^2) / 1000
New loss:
loss = k0^2 * out0^2 * 1000 / 2000 + k1^2*sum(out1^2) / 2000
so to be identical:
k0 = sqrt(2)
k1 = sqrt(2 * 10)
Finally you may imagine that I would like to solve more complicated equations using the PINNs and I understand that if we have more equations and more boundary conditions to be met then I must add more variables in the call function, for example assume that we have 3 equations and 10 boundary conditions then the function call must return a vector of 11 dimension correct me please if am wrong.
In the example above you have 1 equation and 1 boundary condition, the return dimension is 2. Then I think if you have 3 equations and 10 boundary conditions the output dimension should be 13. Why do you think it is 11?
Thank you very much for the explanation @fabiodimarco , now I understand better the formulation, basically when I was calling the pinn method on the input values of $t$ I was getting an output of two columns which corresponds to the residuals of the two losses. As for the PINN solution $y$ it should be obtained by calling 'model' which is a member of the class PINN, as it is done below
tTest = tf.Variable(tf.linspace([0.0], [1.5], 1000)) // Generate uniform grid points for testing the model
pi = tf.constant(m.pi)
residuals_PINN = pinn(tTest) // compute the residuals of the two losses, residulas_PINN will be [1000 X 2]
y_PINN = pinn.model(tTest) // compute the PINN approximation of the ODE at the generated points
y_GT = 2*tf.math.sin(2*pi*tTest) + 1
Are you sure that a division by num_steps
is needed for k1
? isn't that a mean operation is already done by the loss function therefore you we don't need to divide by num_steps
.
Finally you may imagine that I would like to solve more complicated equations using the PINNs and I understand that if we have more equations and more boundary conditions to be met then I must add more variables in the call function, for example assume that we have 3 equations and 10 boundary conditions then the function call must return a vector of 11 dimension correct me please if am wrong.
Thank you very much.
s for the PINN solution it should be obtained by calling 'model' which is a member of the class PINN, as it is done below
Correct.
Are you sure that a division by num_steps is needed for k1 ? isn't that a mean operation is already done by the loss function therefore you we don't need to divide by num_steps .
You are right since the loss is the mean then division by num_steps is not needed.
Original loss:
loss = out0^2 + 10 * sum(out1^2) / 1000
New loss:
loss = k0^2 * out0^2 * 1000 / 2000 + k1^2*sum(out1^2) / 2000
so to be identical:
k0 = sqrt(2) k1 = sqrt(2 * 10)
Finally you may imagine that I would like to solve more complicated equations using the PINNs and I understand that if we have more equations and more boundary conditions to be met then I must add more variables in the call function, for example assume that we have 3 equations and 10 boundary conditions then the function call must return a vector of 11 dimension correct me please if am wrong.
In the example above you have 1 equation and 1 boundary condition, the return dimension is 2. Then I think if you have 3 equations and 10 boundary conditions the output dimension should be 13. Why do you think it is 11?
Thank you very much for your answer, for the last point you are right it was a typo, the number of outputs is 13 in that case. I will try to see what are the advantages and limitations of using the LM optimizer for more complex problems. Thanks again for being responsive and for the code.
Hi @fabiodimarco , sorry for reopening the issue again, I would like to ask you one thing about training of the model using the LM optimizer after a pre-training procedure with the Adam optimizer. I do that with the aim of getting the model weights to a point that facilitates the training with the LM optimizer and consequently avoid getting stuck in a local minima. I have modified the code to the one below in order to do that. I have noticed that the LM optimizer does not benefit from the pre-training procedure of the Adam optimizer and it diverges often from the correct solution after the first epoch. Do you think that there is something wrong ? Here is the code:
import tensorflow as tf
import math as m
import levenberg_marquardt as lm
import matplotlib.pyplot as plt
num_steps = 1000
t = tf.Variable(tf.zeros(shape=(num_steps, 1)))
class PINN(tf.keras.Model):
def __init__(self):
super().__init__()
self.model = tf.keras.Sequential([
tf.keras.layers.Dense(units=16, activation='tanh'),
tf.keras.layers.Dense(units=16, activation='tanh'),
tf.keras.layers.Dense(units=1, activation='linear')
])
def call(self, inputs, training=None, mask=None):
pi = tf.constant(m.pi)
t0 = tf.constant([[0.0]])
u_ic = self.model(t0)
t_random = tf.random.uniform(shape=(num_steps, 1), maxval = 1.5)
t.assign(t_random)
with tf.GradientTape() as tape1:
u = self.model(t)
u_t = tape1.gradient(u, t)
out0 = tf.broadcast_to(u_ic - 1.0, shape=(num_steps, 1))
out1 = u_t - 4.0 * pi * tf.math.cos(2.0 * pi * t)
k0 = tf.math.sqrt(2.0)
k1 = tf.math.sqrt(2.0)
out = tf.concat([k0 * out0, k1 * out1], axis=-1)
return out
def main():
x = tf.zeros((num_steps, 1)) # dummy input with one sample
y = tf.zeros(shape=(num_steps, 2))
pinn = PINN()
yt = pinn(x)
pinn.summary()
pinn.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.MeanSquaredError(), run_eagerly=False)
# Pre-train the model using the Adam optimizer
pinn.fit(x, y, batch_size=num_steps, epochs=2000)
tTest = tf.Variable(tf.linspace([0.0], [1.5], 1000))
pi = tf.constant(m.pi)
y_PINN = pinn.model(tTest)
y_GT = 2*tf.math.sin(2*pi*tTest) + 1
# Visualize the pre-training results
fig = plt.figure(figsize=(100, 50))
ax = fig.add_subplot(111)
ax.plot(tTest.numpy(),y_GT,'r',tTest.numpy(),y_PINN,'--b',markersize=10, linewidth=10)
plt.xlabel(r'$t$', fontsize = 50)
plt.ylabel(r'$u$', fontsize = 50)
plt.grid(linewidth=2)
plt.tick_params(axis='x',labelsize=30)
plt.tick_params(axis='y',labelsize=30)
plt.legend([r'$GT$', r'$PINN$'],fontsize = 30)
plt.show()
model_wrapper = lm.ModelWrapper(pinn)
model_wrapper.compile(
optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
loss=lm.MeanSquaredError(),
jacobian_max_num_rows=num_steps * 2)
# Train using the LM optimizer
model_wrapper.fit(x, y, batch_size=num_steps, epochs=100000)
residuals_PINN = pinn(tTest)
y_PINN = pinn.model(tTest)
print('y Error is ' + str(tf.reduce_mean(tf.square(y_GT-y_PINN))))
# Visualize the training results
fig = plt.figure(figsize=(100, 50))
ax = fig.add_subplot(111)
ax.plot(tTest.numpy(),y_GT,'r',tTest.numpy(),y_PINN,'--b',markersize=10, linewidth=10)
plt.xlabel(r'$t$', fontsize = 50)
plt.ylabel(r'$u$', fontsize = 50)
plt.grid(linewidth=2)
plt.tick_params(axis='x',labelsize=30)
plt.tick_params(axis='y',labelsize=30)
plt.legend([r'$GT$', r'$PINN$'],fontsize = 30)
plt.show()
if __name__ == '__main__':
main()
I do not think pre-training with Adam will help you into not getting stuck into a local minima. Something that I see here which is not ideal is the LM learning rate which is too low (0.001).
You should try to get it to 1, you can set a bit less than 1 if you find it to be too unstable.
LM is already an adaptive optimiser which will decrease the learning rate if it is not able to reduce the loss.
Performing small steps will give you more possibility of getting stuck on local minima. If you perform larger steps and a more "noisy" training in general it will be easier to escape from local minima.
You can also try to reduce the number of attempts of the LM optimiser, by default attempts_per_step
is set to 10. This means that LM will compare the loss before performing the training step and the loss after performing the step, if the loss is lower then perform the step and go to the next batch, if the loss is higher it "reduce the learning rate" (not exactly like that) and try
again for a maximum of attempts_per_step
. After that number of attempts it just perform the step even if it does not decrease the loss.
Optimisers like Adam will always perform the step even if they increase the loss. So you could try to reduce attempts_per_step
to make the training more "noisy" and explore more the solution space.
Let me know if it helps.
Thank you @fabiodimarco for your answer and for the tips, in fact changing the attempts_per_step
to value of 2 has improved the rate of success of the algorithm, it succeeded in converging in 6 tries out of 10.
However, I am still surprised that you are saying that pre-training with the Adam optimizer will not necessary help the LM in converging. I have even tried a pre-training which is basically enough to converge to the correct solution, yet the LM optimizer would diverge on the very first step which I find counter intuitive if the goal of the algorithm is to minimize the loss.
I am now considering a more complex case with 14 different losses and when I run the code I got memory issues and the code was aborted not even after doing one epoch, I don't know if there are ways to alleviate this problem. Thank you very much.
Hi @saddamhijazi ,
However, I am still surprised that you are saying that pre-training with the Adam optimizer will not necessary help the LM in converging. I have even tried a pre-training which is basically enough to converge to the correct solution, yet the LM optimizer would diverge on the very first step which I find counter intuitive if the goal of the algorithm is to minimize the loss.
One possibile reason LM diverge is that is performing too large steps in the direction of improving just one batch of data and that batch of data is too small and not rappresentative of the majority of the data in the dataset. Then improving too much on that direction can make the other batches loss worst. First order optimiser like Adam or SGD usually work better in the mini batch configuration as they take very small steps. So to summarise, I think the reason LM is diverging even if you pre-trained with Adam is that the number of examples shown at each training step might be too small.
I am now considering a more complex case with 14 different losses and when I run the code I got memory issues and the code was aborted not even after doing one epoch, I don't know if there are ways to alleviate this problem. Thank you very much.
I will refer here to the message I wrote you above:
Usually LM works better in the overdetermined case (num_residuals >= num_parameters). But it is not mandatory, you can also try it in the underdetermined case.
For the following code: num_residuals = num_steps * 2 (num_steps = 1000)
So as long as your model has less than 2000 parameters you are in the overdetermined case. If you want to increase the model size then you can increase the num_steps, of course at a certain point you will run into memory issues.
Since now you have 14 losses and num_steps is still equal to 1000 then: num_residuals = num_steps 14 = 1000 14 = 14000
You can consider reducing the num_steps or in alternative use this other loss function that I implemented:
class ReducedOutputsMeanSquaredError(tf.keras.losses.Loss):
"""Provides mean squared error metrics: loss / residuals.
Consider using this reduced outputs mean squared error loss for regression
problems with a large number of outputs or at least more then one output.
This loss function reduces the number of outputs from N to 1, reducing both
the size of the jacobian matrix and backpropagation complexity.
Tensorflow, in fact, uses backward differentiation which computational
complexity is proportional to the number of outputs.
"""
However, for memory consideration and how to improve on it you should read carefully the section: Memory, Speed and Convergence considerations In the README.md of the repo.
Thank you @fabiodimarco, I will read the section you referred to and hopefully I will be able to improve the efficiency in my specific testcase. Thank you very much for being helpful.
Hi, I would like to apply the LM optimizer for the training of physics informed neural networks (PINNs), I have to write a custom loss function so I followed one example that was mentioned in another issue and then I customized it to my problem. Here I am trying to solve a very simple ODE. As far as I understood I have to create a custom loss class and then I have to call the custom loss function in the call method. Here is my code:
As you can see the loss function contains two terms, I have the following error
' uIC = model_wrapper.predict(t0) RuntimeError: Method requires being in cross-replica context, use get_replica_context().merge_call() '