facebook / Ax

Adaptive Experimentation Platform
https://ax.dev
MIT License
2.38k stars 311 forks source link

Question: Best way to include non-scalar types in experiment results? #880

Closed nickelsey closed 1 year ago

nickelsey commented 2 years ago

Hi all, first, thanks for open-sourcing Ax, its a great tool, and apologies if I have missed an obvious answer!

We're currently using the Bayesian optimization functionality, but we're trying out a different approach that allows us to generate extra data when running an experiment (basically estimating gradients of experiment inputs) that would be used during later optimization steps to explore local neighborhoods in the search space. I was hoping there was a well supported way to return large, arbitrarily sized arrays from an experiment. Currently, the best method I can see is writing the array to file and return the filename as a string that can then be read in by the model when it is time to generate the next trials. I was wondering if there were other methods that I had missed?

eytan commented 2 years ago

Hi Nick,

How were you planning on using this gradient information in your model and acquisition function? @lena-kashtelyan might have the best idea of whether there might be convenient ways to dump the data. We are currently working on refactoring some of our Ax/BoTorch data model to allow for more complex data, such as comparison data and tensor-valued data (e.g., https://arxiv.org/pdf/2106.12997.pdf). BayesOpt with gradient information could be another example of where you might collect data about (partial) derivatives. (also cc @dme65 , @j-wilson). Could you share a bit more about what sort of data you wanted to store and how you'd use it?

nickelsey commented 2 years ago

Hi Eytan,

To be honest, we actually aren't completely sure how this is going to fit into our workflow yet (if at all), we're still exploring options. To be more specific, the application is a design optimization problem where we have a design parameter space that gets used to generate a 3D mesh of a physical object, and that mesh is then run through simulations, and we're optimizing a specific quantity from the simulation. The gradients have come into the picture because we're attempting to replace the traditional computer-aided engineering simulator with a DNN, which will allow us to calculate the gradients of the input mesh with respect to the output variable of choice.

Once we have those gradients, we have to map them back to the design parameter space, which is what we're working on now. Its unclear if that mapping will end up being a learned model, heuristic-based, or something else. Once thats done, we're thinking we might not need to model the parameter space anymore, we can use these inferred gradients for direct optimization. But we'd still like to fit this into the framework of Ax if possible, because we also want to continue using it for Bayesian optimization as well, and because its a good tool in general for managing iterative experiments.

Sorry for the long answer, hopefully thats a little clearer! I would also be happy to help with any work that would be needed, I would just need to clear that with my company first, I'm not sure what their official policy is on OSS contributions.

lena-kashtelyan commented 2 years ago

Hi @nickelsey, sorry that it took a bit to get back to you on this!

When you say this, what do you mean by "return"? Is the goal to basically store some extra data each time you use the Ax/BoTorch model to produce new candidates?

a well supported way to return large, arbitrarily sized arrays from an experiment

Also which Ax API are you using? Knowing this will help me figure out the best way to do what you are looking to do : )

nickelsey commented 2 years ago

Hi Iena, thanks for getting back!

Sorry, that wasn't the clearest terminology - I meant that we need each trial to be return an array. As of right now I'm using the service API for our proof-of-concept, but we're switching to the developer API currently, because we need to a custom scheduler and runner to interface with our infrastructure. Our hope was to be able to return these arrays as just another metric, but since metrics are returned as a row in a dataframe (from my understanding of the docs and the scheduler.ipynb tutorial), that doesn't seem like an acceptable solution. In that case we'd also be concerned about memory leaks, because our arrays could be in the 100s of MB, or ~1 GB per trial.

If thats not clear, please let me know!

lena-kashtelyan commented 2 years ago

we're switching to the developer API currently, because we need to a custom scheduler and runner to interface with our infrastructure

That sounds like a great idea –– did you see the tutorial we have for the Ax scheduler? It might be useful in this endeavor : ) https://ax.dev/tutorials/scheduler.html


I meant that we need each trial to be return an array

Our hope was to be able to return these arrays as just another metric, but since metrics are returned as a row in a dataframe (from my understanding of the docs and the scheduler.ipynb tutorial), that doesn't seem like an acceptable solution. In that case we'd also be concerned about memory leaks, because our arrays could be in the 100s of MB, or ~1 GB per trial

I think I'm starting to get it, but just a few clarification questions:

  1. What would be the source of the array –– would it be coming form the Ax/BoTorch model or from somewhere else?
  2. In the latter case, the goal would basically be to associate an arbitrary large (up to 1 GB) array of data with each trial, correct?
  3. Are you using/looking to use Ax storage (SQL or JSON)?
nickelsey commented 2 years ago

1) the array would be from the output from an ML model we have deployed as a part of each trial. Its not the Ax/Botorch model being used for optimization, its just part of the trial pipeline. 2) Correct, in a way that can be easily accessed/read by the Ax model/modelbridge when it was time to generate new trial points. 3) It would be convenient/preferred if we could use Ax storage I believe, I was not sure if that was an option. External is fine if not, however.

Balandat commented 2 years ago

I don't think it makes a lot of sense for the Ax metrics themselves to hold the actual data. A better solution is probably to to store the results in some blob storage or filesystem and associate the handle or path pointing to that in a trial's run_metadata. Then one could implement a Metric that would access that blob and process it - we have some internal applications that do exactly that (the outputs of the pipeline are images, and the metrics computes some summary statistics on the images that are then used in the optimization). Would this be a reasonable solution for your use case?

nickelsey commented 2 years ago

Yes, that is a reasonable and acceptable workflow for us. I was wondering if we would end up with something similar to that, but wanted to ask if there might be some other functionality in Ax that I had missed when going over the developer API documentation.

Thanks everyone for the help, and if there are other suggestions please let me know. But we can start developing with that workflow in mind, and go from there.

lena-kashtelyan commented 2 years ago

the array would be from the output from an ML model we have deployed as a part of each trial. Its not the Ax/Botorch model being used for optimization, its just part of the trial pipeline

I agree with Max, I think trial.run_metadata (usually gets written in Runner.run_trial) would be the right place for something like this. I also think that pointing to some blob storage/filesystem and adding identifier/handle to run_metadata.


It would be convenient/preferred if we could use Ax storage I believe, I was not sure if that was an option. External is fine if not, however

It definitely should be an option (Scheduler tutorial I linked above actually shows how to enable storage). Using Ax storage would be a good reason to store the arrays in an external storage and just putting handles for this external storage into run_metadata, since writing those full arrays to Ax DB might be challenging : )


[associating arrays with trials] in a way that can be easily accessed/read by the Ax model/modelbridge when it was time to generate new trial points.

This should feasible through a custom Metric class, as @Balandat suggests above, but it might be somewhat inefficient to put these arrays into dataframes, which is what Metric.fetch_trial_data currently returns (within a Data object). If you did want to put the arrays into dataframes, you might be able to leverage MapData object we have for that.

For directly passing those arrays to the model (without putting them into a dataframe), we might need a custom Data type –– something we've been wanting to support for a while, but currently don't have full support for still. Let me think about this more and see if I can suggest something helpful for this –– in the meantime, check out whether MapData might work for you in the interim?

nickelsey commented 2 years ago

This should feasible through a custom Metric class, as @Balandat suggests above, but it might be somewhat inefficient to put these arrays into dataframes, which is what Metric.fetch_trial_data currently returns (within a Data object). If you did want to put the arrays into dataframes, you might be able to leverage MapData object we have for that

I think we can probably make this work by doing heavy data reduction in the Metric and returning summary statistics, as @Balandat suggested.

For directly passing those arrays to the model (without putting them into a dataframe), we might need a custom Data type –– something we've been wanting to support for a while, but currently don't have full support for still. Let me think about this more and see if I can suggest something helpful for this –– in the meantime, check out whether MapData might work for you in the interim?

This custom Data type would probably be the ideal situation for us, but is not necessary for now. Like mentioned in my original reply to Eytan above, I can see if my company will sponsor some time for me to help with that, if manpower is the primary issue. And I will look into the MapData suggestion as well, thank you!

lena-kashtelyan commented 2 years ago

Wonderful, keep us posted @nickelsey! I'll keep the issue open for now so you can let us know how this goes : )

lena-kashtelyan commented 1 year ago

@nickelsey, is this still something you are looking to do? We have not yet been able to expand our Data object definition (for passing data to models not through dataframes). I think at this point, this is realistically a wishlist item without a concrete time estimate, so I will put this on our Wishlist.