Issue with Local Optima in HJBLQ problem:

Dear Prof. Han,

Thank you very much for the library.

I have been exploring the implementation of the algorithm described in the paper. I found that the constant map $u: (t,x) \mapsto E(g(X_T^{t,x}))$ is indeed a solution of the corresponding PDE(HJBLQ) and minimize the L^2 distance to g(X_T) among all constant candidates, resulting in Y being constant and Z equalling zero.

During my experiments, under different value of lambda, I noticed that u(0, x) often converges to 4.6, which matches the mean of g(X_1) , and the norm of Z converges to zero as expected. To address this issue, I set x_0 as a non-learnable parameter, and observe that u(0,x) converges quickly to the value estimated by Monte Carlo algorithm. But the loss in this method is unstable.

I suspect this behavior might indicate that the algorithm is prone to falling into a local optimum(resulting in the constant map), Could you provide some suggest modifications that could help in circumventing this issue?

frankhan91 / DeepBSDE

Issue with Local Optima in HJBLQ problem: #8