flexcompute / tidy3d

Fast electromagnetic solver (FDTD) at scale.
https://docs.flexcompute.com/projects/tidy3d/en/latest/
GNU Lesser General Public License v2.1
194 stars 44 forks source link

Make `scipy` an optional dependency #2062

Open tylerflex opened 2 weeks ago

tylerflex commented 2 weeks ago

Motivation: allow a more lightweight base package for validation

Not sure if it's possible:

  1. There are some methods that use scipy (like SimulationData.to_mat) so lazy import should be fine there.
  2. xarray interpolation uses scipy, so we'd need to install different xarray versions depending on the installation?
  3. not sure if scipy comes with autograd already

Either way, we probably want regular pip install tidy3d users to have scipy and interp() support. So not sure if to accomplish this goal of having a "lightweight" tidy3d package if we just need to create a more custom installation script to MC rather than using pyproject.toml?

Thoughts @daquintero @yaugenst @momchil-flex ?

momchil-flex commented 2 weeks ago

Motivation: allow a more lightweight base package for validation

Not sure if it's possible:

  1. There are some methods that use scipy (like SimulationData.to_mat) so lazy import should be fine there.
  2. xarray interpolation uses scipy, so we'd need to install different xarray versions depending on the installation?
  3. not sure if scipy comes with autograd already

One other thing that comes to mind is we may want to add it to the trimesh option then. MC basically installs our core packages + trimesh for the validation API to be able to validate all possible simulations, and I remember that some STL validator requires some scipy functionality and it fails if not installed.

Either way, we probably want regular pip install tidy3d users to have scipy and interp() support. So not sure if to accomplish this goal of having a "lightweight" tidy3d package if we just need to create a more custom installation script to MC rather than using pyproject.toml?

That is effectively what this block achieves, albeit in a bit of a dirty way. https://github.com/flexcompute/tidy3d/blob/3f087b5a20b999bc1fa79a06d741c87f1f386dae/pyproject.toml#L41-L48

But yeah MC basically comments that out for the validation API. So the question to me is if scipy can be moved inside this block. It was moved up I think because of that trimesh issue actually, because the other 3 things that you point out should not be needed for validation. Well actually I think we have a validator for custom data that checks that the data the user has provided can be interpolated (e.g. has no repeated coords), so it may be needed for that too.

yaugenst-flex commented 2 weeks ago

Not really sure what the minimum requirements for MC are but autograd depends on scipy (as does xarray as @tylerflex pointed out). That seems pretty tricky to get around? I think autograd only imports scipy when you import autograd.scipy though, so in principle you could pip install autograd --no-deps.

tylerflex commented 2 weeks ago

Not really sure what the minimum requirements for MC are but autograd depends on scipy (as does xarray as @tylerflex pointed out). That seems pretty tricky to get around?

yea it does, i wonder if we just need to say that we only validate the .to_static() version of the component? and just not install autograd (or use --no-deps)?

Frank-DingYC commented 2 weeks ago

I would like to share some background about the problem. We use Web Assembly (More specifically, Pyodide) to import Tidy3D so that we can implement some features, like plot source_time, script objects, show grid, plot permittivity and so on. If we can't limit the core dependencies of Tidy3D, the front-end webpage will suffer from crashing because of out-of-memory issues. For now, @majinge7846 accomplished lots of work to remove Scipy for front-end Tidy3D utilization. However, the problem becomes critical when upgrading Tidy3d to v2.7.6.

Recently, we want to improve the user experience for frequently used functions (Validate/Estimate). We plan to use the same technical solution to implement it. However, we are facing the same problem. Refer to this. Currently, we use lambda to do validation/estimation and pipeline as backup, which will spend lots of time wasted during the simulation.json file transmission. Instead, if we use web assembly to implement validation/estimation, users can get the results immediately.

majinge7846 commented 2 weeks ago

https://github.com/pydata/xarray/blob/9b632fd36edba7c6fcd1eea778ba3d8710c68349/pyproject.toml#L36 image

scipy is an optional deependency of xarray. And I found that it just used for interpolate(scipy.interpolate.interp1d). If the tidy3d wasm without scipy does't call related functions while do validating. the validation of tidy3d wasm is OK.

webgui can even do a check, if tidy3d wasm error/log validation error, "lack scipy please install first" , this is acceptable, we can use lambda/pipeline as fallback. (Of course, it would be better if numpy or other methods can be used to avoid calling Scipy)

I think webgui validation not need designed to handle all simulation cases, just most simulation cases are OK too. lightweight base package may not be needed.

yaugenst-flex commented 2 weeks ago

yea it does, i wonder if we just need to say that we only validate the .to_static() version of the component? and just not install autograd (or use --no-deps)?

oh nevermind, scipy is optional in autograd too. i guess that's good news then.

what would be the best way to go here? sounds to me like it would make sense to add something like a minimal install option under [tool.poetry.extras]?

daquinteroflex commented 2 weeks ago

Just catching up on this. Ultimately, this is one of the package refactor aspects we want to implement in 3.0. In the sense that we want to have an essential core tidy3d API and extend the package functionality from there. So this is helpful to appreciate the core functionality and extensible functionality. Like the core functionality probably should just be an static API data-structure validation and then have a clear boundary for further functionality. In the future, we do want to have minimal tidy3d lightweight pacakge that is extended through tidy3d[option] or simply other independent packages that depend on the minimal tidy3d. This should help for separating lightweight memory operations in the web built upon a minimal installation.

This kind of leads to the issue of how to approach this right now (edit to be clear 3.0 is a bit far away still). Unfortunately poetry does not enable us to "minimize" the base package [tool.poetry.dependencies] : it will always install those when pip install tidy3d. --no-deps is also an option since autograd or xarray don't directly depend on it. Based on previously looking into this, tool.poetry.extras will install extra packages on top of the base installation. So we shouldn't really make scipy an optional dependency and install it in an extra because then users will not have it in their base tidy3d installation by default. Unfortunately, I think the kind of hacky solution we've been using of the CORE NOTCORE commented section in the pyproject.toml would enable it not to be directly installed by MC in their current workflow possibly. Possibly we can add trimesh as Momchil suggests, also since this hack is shared in the interim before 3.0.

If the pipeline can be a good fallback for all the other non-web-assembly local validation cases then I think this could be a good way to get it to work before 3.0! We'd just have to check the scipy to be sure they are on-the-fly per-function like Tyler mentioned.

Happy to do a PR for this if this is the approach we want to go for.

momchil-flex commented 2 weeks ago

So my thoughts:

daquinteroflex commented 2 weeks ago

Nice, yeah I agree!

How urgent is this out of curiosity? If it is very urgent for the web UI (sounds a bit?), probably someone else should do this given my current limited timing constraints. Otherwise, think I've got a few ideas how to implement and properly test this with the right environments as you suggest and happy to focus on cracking on this. Or happy to propose a few implementation approaches too as need be.

majinge7846 commented 2 weeks ago

Nice, yeah I agree!

How urgent is this out of curiosity? If it is very urgent for the web UI (sounds a bit?), probably someone else should do this given my current limited timing constraints. Otherwise, think I've got a few ideas how to implement and properly test this with the right environments as you suggest and happy to focus on cracking on this. Or happy to propose a few implementation approaches too as need be.

test for web to usage: do not install scipy, load/create most simulation (validation) successfully.

tylerflex commented 4 days ago

I spoke with MC and they are saying this is becoming a very urgent issue. Can we come up with a plan for it and assign someone?

daquinteroflex commented 4 days ago

So I began a PR that is linked, unfortunately I'm in crunch time on the first thesis submission by the end of the month so I can't really devote any more extra time on top of the backend PRs I was helping with recently. I become more free early December to devote extra time but maybe someone else can finish implementing this? I was also a bit concerned on the timelines that weren't super clear to me.

In the PR I've been focused on the testing implementation using nox for multiple environments including one without scipy. We could do as agreed on moving scipy on the pyproject.toml, but the problem is making sure the testing suite still works accordingly and hence why I was focused on that within the PR. Sorry it might be very tricky for me to implement this in the timeframe needed until next month.

Sorry to nominate @yaugenst-flex if you'd be ok with that?

tylerflex commented 4 days ago

Thanks @daquinteroflex , let's also consider whether there's a really simple approach that removes scipy but still allows us to create and validate components (unless your branch already plans to implement that?)

momchil-flex commented 4 days ago

So if I understand things correctly,

So is our refactor really blocking for them?

tylerflex commented 4 days ago

I think there's at least one validator that explicitly uses interp (scipy).

https://github.com/flexcompute/tidy3d/pull/1684

but yea other than that maybe we can just make it work without scipy? we could make that validator skip on an import error for example

momchil-flex commented 4 days ago

Well it depends on what they are trying to do, if they are trying to validate it should not skip but fall back on something that can validate properly.

I think there's a trimesh validator that also requires scipy internally.

majinge7846 commented 4 days ago

Well it depends on what they are trying to do, if they are trying to validate it should not skip but fall back on something that can validate properly.

I think there's a trimesh validator that also requires scipy internally.

yes, and webGUI do not try to support trimesh validation, because although trimesh itself is a pure python package, but it depeends on networkx / rtree (hard to get wasm version) and scipy (big), It is acceptable to fallback to pipeline when meet these case.

majinge7846 commented 4 days ago

I think there's at least one validator that explicitly uses interp (scipy).

1684

but yea other than that maybe we can just make it work without scipy? we could make that validator skip on an import error for example

yes, In the short term remove scipy here is a good and big start for wasm validation. My suggestions are (just for reference)

  1. replace scipy by numpy/xarray(functions do not depeends on scipy)/pandas completely. (maybe need more code work)
  2. do a import check in interp, if env do not has scipy, enter an extra logic, and do a simple can_interp check

And gui do not want to lose the ability to check simulation includes stl and custom source/medium with custom dataset . So could we do not skip.

yaugenst-flex commented 3 days ago

I tinkered around a bit with this, see #2087. This right now just fails if there is a medium on which to interpolate. However, if we need to validate interp, there is not really a way around not using scipy other than reimplementing scipy's RegularGridInterpolator ourselves..

tylerflex commented 3 days ago

can we just change the validator in question to not interp directly but instead analyze the coords to determine (to good approximation) whether this DataArray could be interpolated?

tylerflex commented 3 days ago

I think it's intended to just catch an edge cases that pops up from time to time on the backend, but maybe we can target that edge case without doing the interp directly.

tylerflex commented 3 days ago

@momchil-flex @marc-flex

momchil-flex commented 3 days ago

Yeah we could loosen the validator. The original issue was that there were arrays provided which had repeated coords along some dimensions. We could revert to testing that only (we made it more general to potentially capture other errors).

However like I mentioned I'm pretty sure there was at least one other validator that required scipy internally. @yaugenst-flex did you try loading e.g. the tests SIM_FULL in your trimmed test? Well even if that works it doesn't necessarily mean all is good since e.g. some validators only apply to structures that cross a given plane.

yaugenst-flex commented 3 days ago

@momchil-flex ok yeah i tested with SIM_FULL, it's failing because of trimesh and the interp validator. Interp validator is easy to skip, trimesh is failing a bit more badly...

I'm also not super sure what verify_packages_import does. Basically it fails if the module is not found, which will happen anyways if the module import fails?

pydantic.v1.error_wrappers.ValidationError: 4 validation errors for Simulation
structures -> 8 -> geometry -> TriangleMesh -> __root__
  The package 'trimesh' is required for this operation, but it was not found. Please install the 'trimesh' dependencies using, for example, 'pip install tidy3d[<see_options_in_pyproject.toml>] (type=value_error.tidy3dimport)
structures -> 31 -> geometry -> TriangleMesh -> __root__
  The package 'trimesh' is required for this operation, but it was not found. Please install the 'trimesh' dependencies using, for example, 'pip install tidy3d[<see_options_in_pyproject.toml>] (type=value_error.tidy3dimport)
structures -> 32 -> geometry -> GeometryGroup -> geometries -> 0 -> TriangleMesh -> __root__
  The package 'trimesh' is required for this operation, but it was not found. Please install the 'trimesh' dependencies using, for example, 'pip install tidy3d[<see_options_in_pyproject.toml>] (type=value_error.tidy3dimport)
structures -> 32 -> geometry -> GeometryGroup -> geometries -> 1 -> TriangleMesh -> __root__
  The package 'trimesh' is required for this operation, but it was not found. Please install the 'trimesh' dependencies using, for example, 'pip install tidy3d[<see_options_in_pyproject.toml>] (type=value_error.tidy3dimport)
momchil-flex commented 19 hours ago

There was a user error which made me realize that currently the interpolation validator is applied to custom medium and custom field source, but not to custom current source. So when refactoring this validator (maybe we'll split this in a separate issue?) we need to add it to CustomCurrentSource, too.

Otherwise, the user error was still that one of the coords was repeated. So this would again be caught by validating the coords directly without actually calling interp. I was also looking through our code and found this which, by its name at least, sounds like it's doing exactly what we want. However I could not trace this function to where it's actually implemented. Is it an xarray thing? I couldn't find it in their codebase though. @yaugenst-flex ?

yaugenst-flex commented 18 hours ago

I was also looking through our code and found this which, by its name at least, sounds like it's doing exactly what we want. However I could not trace this function to where it's actually implemented. Is it an xarray thing? I couldn't find it in their codebase though.

Yeah this is an xarray Dataset method implemented here. If this is what we are trying to check then yeah definitely we can just throw this into the validator, it's pretty cheap to evaluate I think.

momchil-flex commented 18 hours ago

It doesn't seem to validate the case of repeated indexers out, I think it's more of a type checking thing.