flexcompute / tidy3d

Fast electromagnetic solver (FDTD) at scale.

https://docs.flexcompute.com/projects/tidy3d/en/latest/

GNU Lesser General Public License v2.1

194 stars 44 forks source link

Server side autograd caching bug #1906

Open tylerflex opened 3 months ago

tylerflex commented 3 months ago

What it looks like is happening:

When autograd is used with simulation caching on, the caching does not take the auxiliary files into account. So if the adjoint or forward sim are identical between runs, but the sim_fields_keys or adjoint_source_info is different (which can happen in case of changing objective function or parameterization), the cache will return the old gradients. I suspect this is really only an issue for the latter (adjoint) case, where the server is returning the old VJP values.

Possible fixes:

have the caching function use all of the web API upload
remove auxiliary file uploads entirely from web API?

momchil-flex commented 3 months ago

Yeah the most straightforward fix which would also simplify the process (no need to upload/download various files) would be to add these things to Simulation. We've discussed that it's not very pretty but it certainly looks attractive now. Other fixes I can think of seem to require significant refactoring of some of our web parts that would also have to go through MC.

tylerflex commented 3 months ago

Need to think about adding them to Simulation. It definitely seems like a simple fix at first glance, but might introduce some complications?

I suppose another short term option / fix could be to turn off caching if "autograd" in simulation_type?

momchil-flex commented 3 months ago

I suppose another short term option / fix could be to turn off caching if "autograd" in simulation_type?

True that should work too and maybe can be done very fast, I'll check with MC.

momchil-flex commented 2 months ago

By the way turning off caching if "autograd" in simulation_type has been implemented already last week, so the issue shouldn't be there right now, but we probably still want to handle better in the future.

tylerflex commented 2 months ago

the future is here:

flexcompute / tidy3d

Server side autograd caching bug #1906

1934