Pressio / rom-tools-and-workflows

Other
3 stars 0 forks source link

Support for heterogeneous parameter spaces #103

Closed jtencer closed 9 months ago

jtencer commented 9 months ago

Current parameter space classes only support homogenous spaces. It would be good to support heterogeneous cases such as a mix of normal and uniform distributions, nonvariable parameters (constants), and non-numeric parameters. The ParameterSpace base class may need to be reworked to accommodate.

eparish1 commented 9 months ago

@jtencer I think since our API is very generic (generate_samples is really the only outward facing thing) that this should be no big deal, but did you run into an issue?

W.r.p. to heterogeneous parameter spaces, one thing I'm not sure of is if we should natively support this given the high number of permutations that are possible (e.g, so many different types of distributions, so many different ways of combining them). What do you think? The user can always just create the parameter space that meets there needs, so long as our API allows it.

jtencer commented 9 months ago

I think there's a way that we can do it that's sufficiently generic and doesn't push too much extra effort onto the user. I propose that we define a Parameter construct and then a ParameterSpace is really just a list of Parameters. That should allow users to mix/match more easily without adding a ton of extra concrete classes to our repo.

jtencer commented 9 months ago

I made an example implementation in jtencer/issue103

eparish1 commented 9 months ago

@jtencer Yes, I think this makes sense, but I think we need to be careful about how this gets glued together.

In your implementation , if I understand it right, you have this as the base method for generating samples:

    def generate_samples(self, number_of_samples) -> np.array:
        '''
        Generates and returns number of parameter samples
        Returns np.array of shape (number_of_samples, self.get_dimensionality())
        '''
        samples = [p.generate_samples(number_of_samples)
                   for p in self.get_parameter_list()]
        return np.concatenate(samples, axis=1)

The issue I see with this is that we can't support, e.g., LHS sampling, or any type of distributions with correlations, in our base implementation. Someone could overwrite the generate_samples, but this might defeat the purpose. What are your thoughts? I think if we blow out the generate samples and make it an @abc method, things would then be sufficiently generic.

jtencer commented 9 months ago

Yeah I'm not sure what the best way is. I put that stuff in the base class because I noticed that it was exactly the same in all the derived classes. If we want to include an example for which that's not the case, I'd be OK with going back to having those implementation details in the concrete classes instead. There's only 3 concrete implementations currently and those functions are super short, so it's not a ton of duplication.

jtencer commented 9 months ago

We might also want to consider separating the parameter space and the sampling method into different concepts. And then you could have something like Sampler(my_parameter_space).generate_samples(num_samples)

jtencer commented 9 months ago

I just pushed a change that does this. Not sure I 100% love it still. Maybe the sampler should just be a function instead of a class...

jtencer commented 9 months ago

OK. I think I like what I have now better. Essentially what the parameter do now is convert a uniform random number on (0,1) into a sample of whatever distribution it has. It takes these "seed" random numbers (could use a better name here) as an argument. That separates the generation of the randomness (MC, LHS, etc.) from the definition of the distribution. It also makes the generate_samples function agnostic to how you actually want to do the sampling. (might want to rename that function now though since it's more "rescaling" or "redistributing" samples instead of "generating" them)

eparish1 commented 9 months ago

@jtencer you're a machine! How do you find time to knock this out?

I've looked at what you have, I think that this all makes sense and allows a lot more flexibility. I like the idea of separating the sampler from the parameter space. One thing I noticed, which I think I agree with you but wasn't sure originally, is that you allow "Parameter" to be multi-dimensional. I think Parameter is still the right name, but we should see if anyone else has comments since Parameter makes me think "scalar". But I think by allowing it to be vector-valued, we can do distributions with correlations?

I'll try and go through this more, but I like what you have.

jtencer commented 9 months ago

I went back and forth on if to support vector parameters or not. I didn't have a strong opinion either way, but most of the code to do it was already there so I just did it. I figured it was easier to have it just in case. I don't have any tests for that case though so it's also possible that it's broken. The correlation situation seems like maybe the justification to keep it and add some tests for it though.

eparish1 commented 9 months ago

@jtencer I'm still looking at this.

Reg your approach to have the parameter take in the random array from uniform in [0,1]. Are you sure this is extendable to other distributions? For example, I think Gaussian you can convert with the Box-Muller transform. Can we also do, e.g., Laplace with this? And thinking more complicated about user-defined distributions, I think this could become pretty complicated and am not sure if this is the best approach. But you've thought about this more than me at this point....

eparish1 commented 9 months ago

@jtencer with some googling it looks like there is for Laplace. I'm not sure about an arbitrary distribution, but maybe this isn't as important.

jtencer commented 9 months ago

I believe you should be able to do it for any distribution by inverting the CDF. Not sure about performance implications. Probably generating samples is a pretty small part of the total work though.

eparish1 commented 9 months ago

@jtencer thanks. I'm not worried about performance, just usability. I like what you have, but am concerned it decreases flexibility. I would like to propose something a little more layered. We have a base parameter class closer to what we had, and then we have a derived parameter class that is what you have, and then we have concrete implementations of this. The reason I see to do this is that the abstract base class is not restricted to our definition of a parameter. What do you think? I can take a stab at this.

eparish1 commented 9 months ago

@jtencer Ok, I worked on this some and have the following thoughts:

What are your thoughts on this?

jtencer commented 9 months ago

I agree with you that ParameterSpace should be more about what the API needs to be and less about the implementation details. Let me try and rework it later today.

fnrizzi commented 9 months ago

Should we talk about this param space stuff sometime?

jtencer commented 9 months ago

I think we should talk about it at some point (maybe soon) but also think we've made some good progress on it working asynchronously. I just pushed a new version that fixes the ParameterSpace class to exclusively reflect the interface that's required for downstream usage rather than including implementation details.