Parametric problems with inputs

tju999 commented 1 month ago

Hello, I am very sorry to bother you. I have some questions about opinf to consult. First of all thank you for sharing the opinf code toolkit, I think the current tutorial is much better understood for non-professionals like me. I used data from my own numerical model based on three tutorial cases, mainly using the OpInf method as a surrogate model to solve inverse problems. But now I have some problems, I hope you can help me to answer them, thank you very much!

Problem 1 Taking the OpInf tutorial as an example, in the External Input tutorial, we use different boundary conditions as input u; In the Parametric Problems tutorial, we use different parameters as input /mu. The problem I am facing now is that when I combine these two cases together, I hope my ROM can be applicable to different boundary conditions and parameters simultaneously, but this raises the issue of dimensionality.

rom= opinf.ParametricROM(
    basis=opinf.basis.PODBasis(num_vectors = 10),
    ddt_estimator=opinf.ddt.UniformFiniteDifferencer(t, "ord6"),
    model=opinf.models.ParametricContinuousModel(
        operators = [
        opinf.operators.AffineLinearOperator(coeffs=1),
        opinf.operators.AffineInputOperator(coeffs=1),
        opinf.operators.AffineConstantOperator(coeffs=1),
        ],
        solver=opinf.lstsq.L2Solver(regularizer=1e-6),
    ),
)

rom.fit(parameters = K_list_train, states = H_list_train, inputs = Hr_train,)

The shape of K_list_train is (10), where 10 is a different scalar parameter;
The shape of H_list_train is (10, 4, N, T), where 10 is the corresponding parameter, 4 is different boundary conditions, N is the number of points, and T is the number of time points.
- The shape of Hr_train is (10, 4, 2, T), where 10 is the corresponding parameter, 4 is different boundary conditions, 2 is the input as a 2D vector (with boundary conditions on the left and right sides respectively), and T is the number of time points.

Among them, N=1000, T=120；But when I design the model like this, there will be error alerts

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1000 is different from 10000)

I was checking the ROM and found that full_date-dimension is 10000 instead of N=1000. I don't understand why ParametricROM is not compatible with Input Operators? Where did I go wrong? Could you please help point it out.

Problem 2 The cases in the tutorial are all one-dimensional. Do we need to flatten the data in two dimensions? Problem 3 In the tutorial Problem Statement, the input parameters are one-dimensional. How to set high-dimensional parameters ?And what should be done when these parameters have no practical physical significance for the control equation. For example, I use deep learning GAN to obtain random vectors as parameters and then train ROM. Because the AffineInputOperator in the Problem Statement tutorial seems to have practical significance in the control equation, while the random vector obtained using the parameterization method GAN may not be meaningful on its own and must go through GAN before it can be meaningful. Problem 4 My question is whether preprocessing the data will change the projection structure of the control equations, and generally, transformations are linear. If I use nonlinear transformations such as BOX-COX transformations for some special problems, will the projected control equations change? Should I consider quadratic or cubic forms at this time?

Finally, I apologize for bothering you again. I have recently started learning OpInf and have found it to be very useful! I hope to receive your answer and also look forward to seeing increasingly rich OpInf documentation tutorials in the future! Looking forward to receiving your reply.

shanemcq18 commented 4 weeks ago

Hi @tju999, thanks for the detailed questions, I will take a close look at this soon. But from just a quick glance, I think your first issue is that the arguments to fit() are not the right shape. Each should be a list where the i-th entry corresponds to a single trajectory. So, K_list_train[i] should be a parameter value, Hr_train[i] should be a two-dimensional array of corresponding states, and H_list_train[i] should be an array with the corresponding inputs. See the documentation for ParametricROM.fit().

In your case, you have 10 parameter values but 4 sets of inputs, so there are 40 total training trajectories. This means K_list_train should have 40 entries, H_list_train should be a (40, N, T) array (or a list of 40 (N, T) arrays), and Hr_train should be a (40, 2, T) array (or a list of 40 (2, T) arrays). It's okay that K_list_train will have repeated entries.

I'll make a note to raise a more informative error in this case ("arguments should be a list of 2D arrays, not 3D arrays" or something). I'll also take a look at the tutorial on parametric models to try to clarify this scenario.

tju999 commented 4 weeks ago

Thank you very much for your prompt reply! According to your reply, I modified the shape and successfully ran code. The opinf effect is still good!

Thank you again for your help!

I hope you can help me answer the following questions when you have time.It's not urgent.

Looking forward to receiving your reply!

shanemcq18 commented 1 week ago

@tju999, here are some quick answers to your other questions.

Problem 2

Even if the physical process being simulated is defined over a two-dimensional spatial domain, the snapshots must always be one-dimensional. It doesn't really matter how you flatten them, so long as you use the same process for every snapshot.

Problem 3

Yes, the parameters (K_list_train) can be higher dimensional and not just scalars. If there are 2 parameters, then K_list_train should be (10, 2) if you have 10 training instances. What they mean depends on your model, preprocessing, and so on, I can't really comment on that. The structure of OpInf models is typically determined from the structure of the full-order model, but if you're doing something with a GAN first the parametric structure may not be obvious, which is probably why you started trying to use interpolation in #71.

Problem 4

If you use some kind of nonlinear preprocessing, then yes, you should expect the structure to change. Take a look at opinf.lift, that section might be helpful.

Willcox-Research-Group / rom-operator-inference-Python3