Open Subodh213 opened 9 months ago
Hello Subdh213, I'm currently working on a similar task, and it appears that there's a correlation between epoch size and learning rate that affects convergence, particularly for equations with components of different orders of magnitude. For instance, in the case of the harmonic oscillator example, the parameters k and μ differ by a factor of 100, while μ is already 10 times different from x(t). Although I'm not entirely certain, I speculate that increasing the number of epochs, adjusting the learning rate, and potentially modifying the neural network's structure could aid in achieving convergence.
I tried two different approaches while maintaining the lambda regularization as in the original code, 10**4 to 1 in favor of data points.
Both methods converged to a OK solution, but they did not yield accurate values for k and μ ( maybe with more iterations ). Furthermore, I find it concerning that for such a simple model, I needed to iterate up to 100001 times! I am also interested in optimizing these parameters in different PDEs, including those with forced solutions such as the Harmonic Oscillator with a force over time. Does anyone have insights on how to achieve this?
20/5 NN 1.000.001 iterations
32/3 NN 500.001 iterations
the solutions above look like they are trying to become discontinuous - this makes me think that there are not enough collocation points for the physics loss, have you tried increasing the number of collocation points?
also note that the Adam optimiser tries to take step sizes of the same magnitude as the learning rate. So if your learning rate is 1e-3, optimising k from 0 to 400 would take a minimum of 400,000 training steps. So I recommend normalising mu and k to the range [-1,1], or using different learning rates depending on their expected magnitude
Hello Ben,
Thank you a lot for your reply, it really helped me think more about it.
So I tried with more collocation points in the physics loss and changing the learning rate to lower rates to have a more smooth loss towards the minima. I also increased the ephocs and my results are bellow my remarks.
I just didn't understand how I can normalize the constants. I mean, if I have a PDE such as
I know I can divide the whole equation by one of the constants to normalize one of them, but the proportion between the constants will always be the same no matter what I do to the equation. If I change them in different ways I would also change the dynamics of the system. So what do you mean by normalizing? Could you recommend any articles in the subject please?
Moreover, since the objective is to identify the system by finding its constants, it should be more interesting to add another term in the loss function of the physics loss related to the input signal for different signals, right? What are your insights about it?
From my tries I guess a smoother loss => smaller step (learning rate) and a higher regularization to the data points loss (lambda) influences a lot the learning. Is it right ?
( Data loss and physics loss graphs here are with inverted titles )
CLOSER to the END training
In actual problem, I have only data can this be used to discover all the parameters? I tried using k and mu as unknown parameters but solution dont converge.