Meeting outcomes 10/09/03

amartinhuertas commented 11 months ago

We were discussing about which problem to solve, and the details underlying the generation of a training dataset. Discussions are still in a preliminary stage, so do not take the following as set in stone. Comments and edits of the below are welcome ...

@sghelichkhani ... please review and add as per required.

We have come up with an step-0 problem, with the following features:

Transient linear mantle convection on a square (lets keep aside non-linearities temporarily)
Fixed mesh, of, e.g., 150x150 quadrilaterals.
Inf-Sup stable Finite Elements for the spatial discretization.
Crank-Nicolson (Implicit) time discretization. Time step size? Fixed or adaptive?
Constant parameter values, i.e., no spatially-varying viscosity, etc. In a first stage, we can generate a dataset with these parameter values fixed. In a more advanced stage, with applicability in mind, they will have to become input parameters to the ML system as well.
Initial condition (temperature) is not constant, but a spatially-varying field. It becomes the main input parameter to the ML system, and the variable to sample in order to generate an initial dataset.
Fixed time domain $[0,T]$. Value of $T$?

The first ML system should ideally be able to predict the temperature field at time $t{n+1}$ given the temperature field at time $t{n}$, with $n=1,\ldots,N$, and $N$ the number of parts in which we discretize $[0,T]$. This would be like a sort of ML-based cheap Forward Stokes solver.

As we do not actually know if this can be even be achieved (perhaps too much ambitious), we said that we can define a set of predefined points in time (say, after, 10%, 25%, 50%, 75% of time elapsed) in which we are going to observe the mapping that transforms the solution in the next time step from the previous one, and then train different neural networks for each of these moments in time. Then, we can evaluate their accuracy in predicting unseen simulations at these moments in time.

In my view, CNNs (convolutional neural networks) would be the method of choice for such task. The issue that we have now is how should we feed the CNNs, either with the raw field values resulting from the numerical solver (read from HDF5 files), or a rasterization into 2D RGB images otherwise. The conclusion is that we should do some extra research to determine this, although I am relatively certain that usually CNNs work with images. See, e.g, the following paper https://www.sciencedirect.com/science/article/pii/S0009250921004516. They say:

The input images are ten sequential frames with a size of 64 x 64 x 3, which are concatenated into one tensor of size 64 x 64 x 30, and the output is one frame with the size of 64 x 64 x 3.

BTW, in this paper, they are packing the first 10x time frames of the simulation into a single tensor, and then predict the output at the next time step.

Some additional concerns:

Should we train the ML system with temperature only data, or may it make sense to also feed it with velocities and pressures?
Is the numerical solver robust regardless of the physical parameter values? (this is important when we start sampling these). To reduce noice, we could restrict ourselves to sparse direct solvers at each time step.
What else?

amartinhuertas commented 11 months ago

Another place to look at: https://github.com/pdebench/PDEBench

amartinhuertas commented 11 months ago

I asked my colleague (Ricardo Vinuesa), in regards to this, and he answered in Spanish (gathered here for the records)

Pasamos los campos 2D de la simulación. Como tenemos 3 inputs, son 3 canales cada uno con una variable (tauwx, Tauwz y pw) de la simulation. Eso sí, hacemos periodic padding.

As far as I can undertand, he mentions that for his 3D simulations (DNS, Navier-Stokes), they take y slices of the 3D simulation, and then feed the CNN with the three channels being x and z components of the velocity and the pressure. Not sure what he means by "periodic padding".

Thus, they are NOT generating images out of the numerical fields. In our case, we could use the (adimensionalized) temperature field in a 2D grid with a single channel. In any case, we should not lost the grid size info in the HDF5 files (I think it wont be the case, but just in case).

amartinhuertas commented 11 months ago

For reference, Ricardo, also shared the codes that they have used in different CNN papers for turbulence flows

https://github.com/KTH-FlowAI

I am still trying to understand what they actually do. After some initial email exchanges, I am not sure if we are understanding each other. Keep you updated.

amartinhuertas commented 11 months ago

In particular, see https://github.com/KTH-FlowAI/FCN-turbulence-predictions-from-wall-quantities

amartinhuertas commented 11 months ago

I am still trying to understand what they actually do. After some initial email exchanges, I am not sure if we are understanding each other. Keep you updated.

Update in regards to this. I am now sure that they are NOT passing images (e.g., png or jpg) or similar to the CNNs. He keeps insisting me that they pass matrices with values to the CNNs, but I dont yet quite grasp what they are passing or how do they generate this. I guess that the values of the degrees of freedom (DoFs) of the finite element functions.

BUT ... if we have a second-order polynomial field (we typically need this for the velocity in the case of inf-sup stable finite elements, such as Taylor-Hood), what do we do with the DoFs which are on the faces or on the element interiors? Should we post-process a "linearized mesh" where the vertices match the position of the DOFs?

And what if if our mesh is unstructured? (i.e. generated by an automatic mesh generator ...)

amartinhuertas commented 11 months ago

And what if if our mesh is unstructured? (i.e. generated by an automatic mesh generator ...)

Ok, now I think I understand. If the mesh is structured (e.g., Cartesian), then they use the mesh vertices no matter the order of the elements. If the mesh is unstructured, then they interpolate the fields into a Cartesian mesh for CNN training purposes. As said above, they do NOT use images.

amartinhuertas commented 11 months ago

@sghelichkhani ... can you already proceed to generate a preliminary data set or you need something else to be understood?

sghelichkhani commented 10 months ago

@amartinhuertas So I have now added a gadopt script (which a wrapper around firedrake for mantle convection simulations) to generate random datasets. It should be quite clear how I am generating the simulations and what the structure of the data looks like. Here are a few things for us to keep in mind:

For each file, I am generating an initial condition with random number of points and random amplitude and coordinates of gaussian anomalies distributed in space.
starting from each initial condition we convect as long as there is meaningful change in the simulation (that is the temperature fields change enough after one time-step)
the timstep has to be adaptive. otherwise the whole random generation of the intiial condition would be a bit tricky. So in my output files I also write out the time stamp attribute. I don't know TBH how we can use the magnitude of the time step for the purpose of training.
After giving it enough thought, I think the starting point here should be to use each pair of temperature fields as one training set. So the nn that we want to generate should replace one time-step of going from Ti to T{i+1}.

amartinhuertas commented 10 months ago

Hi @sghelichkhani ! Thanks for sharing ... I will take a detailed look when I have some time, I already saw that you are using Taylor-Hood Q2-Q1 for the velocity-pressure space pair ... That's what I was expecting, and standard. You may also use piece wise constant pressure space (if am not missing something).

(I am now flying to Japan for a conference).

One quick question: did you run the scripts and have a data set you can share with @GiteonCaulfied so that he can start doing some preliminary experiments? Thanks !

sghelichkhani commented 10 months ago

Hi @GiteonCaulfied. I have put the files here which you can download https://anu365-my.sharepoint.com/:u:/g/personal/u1093778_anu_edu_au/EXSV10A0DtpDodSZwbpm06wBK6dFLS-MUfbVCc9PIE4t6g?e=2CE8FN There is also a script that I have push in the last 10 mins. You can use that to read each file. Please note that each two consecutive temperature fields can be used as one data. Maybe load them all in and see how big your dataset becomes.

GiteonCaulfied commented 10 months ago

Hi @sghelichkhani @amartinhuertas @rhyshawkins

I've tested the data provided and implemented a basic Convolutional Auto-encoder for the temperature fields of the first two consecutive time stamps.

For now, I am thinking that since our goal is to build a unique model for each tuple of consecutive time stamps (0->1, 1->2,...,98->99), this unique model could have two parts: a Convolutional Auto-encoder (ConvAE) to cut down the dimension for both "before and after" temperature fields, and a regular convolutional neural network (Conv NN) that feeds with simplified "before" temperature field then outputs the simplified "after" temperature field. Since the provided dataset has 100 data files and each of these data files has timestamps ranging from 0 to 99, eventually we will have 99 ConvAE and 99 Conv NN for this dataset.

My plan is to implement an effective Convolutional Auto-encoder first, then try to deal with the regular convolutional neutral network later. For this basic Convolutional Auto-encoder I've implemented, I am training it with all the temperature fields labelled with either 0 or 1 in timestamps from all the data files, which make the training set having 160 data samples since the dataset is split in a ratio of 8:1:1 for training, testing and validation. I understand that 160 may seem a small number for training neural networks but surprisingly there's no overfitting problem occurs. (Actually, even if I increase the dataset to include a bit more temperature fields labelled with later timestamps (0 to 4), there's no significant change in the result. However, I still haven't try to test with the complete temperature fields (all 10000 of them) since that takes a lot of time, but I will definitely do it later to see how it comes)

The structure of my ConvAE so far is mostly built upon the auto-encoder in the ForwardSurrogate_Mars_2D repo, since they are also using an auto-encoder to cut down the dimension of a temperature field. (I also notice that the size of the latent space in their repo is actually inconsistent with the one they mentioned in their paper, but that is kind of off the topic) In short, I use three Conv2D layer as the encoder, all of them has Tanh() as the activation function and a BatchNormalisation layer after them (However, BatchNormalisation layer may not be useful in this case since I set the batch size to be 1 due to the small number of training samples)

The performance of the ConvAE looks good in loss values (calculated using MSELoss), but seems not so well when I try to visualise it. Unfortunately, I've tried to increase the number of epoch or using ReLU as the activation function but none of these methods works. The next step will be training the ConvAE using all 10000 temperature fields and I will do it later (If that works, we can cut the number of total ConvAE generated from 99 to 1)

The basic ConvAE I've implemented along with some test result is in the file 2D_ConvAE.ipynb. I haven’t pushed the data folder in this commit since it’s about 2 GB.

GiteonCaulfied commented 10 months ago

UPDATE:

I just removed the BatchNormalisation layer for the ConvAE and the activation function after the last Conv2D layer in the decoder (The number of the training samples is still 160). The result is now way better than the last commit so I pushed it to the repo. (Also, I switched the testing timestamps from 0-1 to 50-51)

GiteonCaulfied / COMP4560_stokes_ml_project

Meeting outcomes 10/09/03 #3