GiteonCaulfied / COMP4560_stokes_ml_project

A repository that we are going to use to keep track of project evolution, notes, ideas, etc.
1 stars 0 forks source link

Meeting Outcomes 28/08/23 #5

Open rhyshawkins opened 10 months ago

rhyshawkins commented 10 months ago

Continue with autoencoder framework and add the ability to predict the latent space at the next timestep with a simple fully connected NN as a first attempt.

I've added a reference to some work of some students I saw last year and they also have a github for the project as well that may be useful to look at: https://github.com/jackmiller2003/kae-cyclones/tree/main

GiteonCaulfied commented 10 months ago

Hi,

I've implemented a basic fully connected NN to predict the latent space at the next timestep. Like how I trained the the convolutional auto-encoder, the fully connected NN is trained using the python file LatentSpace_Prediction_training.py and the job file LatentSpace_Prediction_training_job.sh. The training result is then downloaded from the remote server and then visualised in the notebook LatentSpace_Prediction_visualisation.ipynb.

I have tried two different set of hidden layers for the fully connected NN: A simpler one with 1000 and 200 neurons in its hidden layers and a more complex one with 3105 and 1035 neurons in the hidden layers. The auto-encoders used for these two fully connected NNs are also different, for the simpler one the ConvAE is the same as the last commit while for the more complex one the ConvAE is iterated with 100 more epochs than the last commit. The simple ConvAE and and the simple fully connected NN are stored in subfolder commit-5ade157 of the data folders 2D_ConvAE_results and 2D_LatentSpace_Prediction_results seperately. And the complex one directly lies in the data folders 2D_ConvAE_results and 2D_LatentSpace_Prediction_results

The size of the model files for the fully connected NNs are large, even the simpler one is about 51.3MB and the complex one is 180MB. As for the testing result for these two NNs shown in the LatentSpace_Prediction_visualisation.ipynb, they are both able to predict a majority of the features at the next timestep when I convert the predicted latent space back to original 1x201x401 size using decoder. (I also tested the performance of the auto-encoder as well, so there are 6 images in total for a testing set: Best case actual output, Best case ConvAE output, Best case latent space prediction output and 3 more for the worst case) Also, the complex one is better than the simpler one when it comes to some details in the output images.

I haven't tested a complete cycle of the timestamps yet (from the start timestamp to the end timestamp), but I will definitely do it later to see how it comes.

The updated files will be pushed shortly after this comment.

UPDATE: The complex one (180MB) exceeds the maximum file size of the git (100MB) and I can't push it, I'll save it in my local laptop for now and try to think of a way to add it later

amartinhuertas commented 10 months ago

Nice work @GiteonCaulfied !

I haven't tested a complete cycle of the timestamps yet (from the start timestamp to the end timestamp), but I will definitely do it later to see how it comes.

Regarding this I am wondering the following regarding your current approach:

As far as I understood from yesterday's e-mail, this was the idea to account for the time step discretization adaptivity scheme.

Am I right? @rhyshawkins and @sghelichkhani?

amartinhuertas commented 10 months ago

I haven't tested a complete cycle of the timestamps yet (from the start timestamp to the end timestamp), but I will definitely do it later to see how it comes.

Let us see how this goes, but as far as I know, machine learning systems have typically much poorer performance when they are fed with synthetically-generated data (instead of true data from the data set). As far as I understand, you want to feed the output of the NN to its input in a loop, correct?

amartinhuertas commented 10 months ago

and try to think of a way to add it later

The link below is one possible option. I have not used it before, though ...

https://docs.github.com/en/repositories/working-with-files/managing-large-files

GiteonCaulfied commented 10 months ago

Hi @amartinhuertas

Regarding this I am wondering the following regarding your current approach:

  • Is the time-stamp also an input to the NeuralNet? (along with the latent space representation of the temperature field corresponding to such time-stamp).
  • Is the time-stamp of the predicted temperature field an output of the NeuralNet as well?

For the current NN I've implemented, timestamp is not serving as an input or an output. Only the temperature fields are involved here and timestamps are only used to determine which temperature field happens after another in creating the customised dataset fed to the NN.

Let us see how this goes, but as far as I know, machine learning systems have typically much poorer performance when they are fed with synthetically-generated data (instead of true data from the data set). As far as I understand, you want to feed the output of the NN to its input in a loop, correct?

Correct! I plan to feed the output of the NN to its input in a loop to get the predicted temperature field in the further timestamps.

The link below is one possible option. I have not used it before, though ...

https://docs.github.com/en/repositories/working-with-files/managing-large-files

Thanks for the link! I'll definitely check this later.

UPDATE: I have uploaded the complex NN model using git LFS in the link.

GiteonCaulfied commented 10 months ago

Hi,

I've implemented some basic test code for a complete cycle of the timestamps (from the start timestamp to the end timestamp) at the end of the notebook LatentSpace_Prediction_visualisation.ipynb. By specifying an index number for the file I want to test with, the code will run a complete cycle of the predictions from t0 to t99, and illustrate four of these predictions (also specified manually for testing convenience) in original size 1x201x401 with the real data serving as contrast.

The result is bad, I've tested several files and none of them "survives" after t20. The small prediction mistakes of the ConvAE and the fully connected NN will add up and become bigger and bigger when feeding the output of the NN to its input in a loop, creating some kind of "Butterfly effect" in the end.

For now, I will try a set of ConvAE and NN with more epochs iterated to see if the performance will be improved.

The updated code will be pushed shortly after this comment.

amartinhuertas commented 10 months ago

For the records, @rhyshawkins wrote the following in a separate email:

On the issue of training to predict the timestep size, I think that may be something for the future although perhaps not too difficult. I suspect that the data that Sia has generated doesn’t have the time step included so he may have to regenerate or reprocess to obtain the necessary training data for Xuzeng.

amartinhuertas commented 10 months ago

I've implemented some basic test code for a complete cycle of the timestamps (from the start timestamp to the end timestamp)

FYI ... in the attached notebook, you may find some code to generate animations using Matplotlib (see evolve_life function for more details). Although the code of the notebook has nothing to do with PDE simulations, it can still be helpful to generate animations of the transient simulations generated by the NN versus data set.

Python_S10_Game_of_life_Inclass_Solution.zip

GiteonCaulfied commented 10 months ago

Hi @amartinhuertas

I've generated animations using the notebook you provided. I've tested a set of ConvAE and NN that are both iterated through 1000 epochs, the loss value is still dropping without overfitting problem nearing the 1000th epoch but the performance for the complete cycle of predictions still can't be improved.

Should I go further and test with more epochs (e.g. 1500, 2000...) until overfitting occurs or should I try to modify the structure of my ConvAE and NN instead? (Currently training ConvAE and NN with 1000 epochs takes about 450 SU in total)

The updated code will be pushed shortly after this comment.

UPDATE: The animation is not available when browsing LatentSpace_Prediction_visualisation.ipynb using the github website, you need to first add the dataset folder solution to the same directory as the LatentSpace_Prediction_visualisation.ipynb (I didn't do that in any commit because the dataset is too large) then run the visualisation notebook manually to watch it.

UPDATE of UPDATE: No worries, I have save the animation as a GIF file called actual_and_Predictions.gif so that you can watch it directly.

amartinhuertas commented 10 months ago

UPDATE: The animation is not available when browsing LatentSpace_Prediction_visualisation.ipynb using the github website, you need to first add the dataset folder solution to the same directory as the LatentSpace_Prediction_visualisation.ipynb (I didn't do that in any commit because the dataset is too large) then run the visualisation notebook manually to watch it.

Note that you can also save the animation in, e.g., an mp4 file. https://www.geeksforgeeks.org/how-to-save-matplotlib-animation/

amartinhuertas commented 10 months ago

UPDATE of UPDATE: No worries, I have save the animation as a GIF file called actual_and_Predictions.gif so that you can watch it directly.

Ok, I just saw this comment of yours ...

GiteonCaulfied commented 10 months ago

Hi,

I've added a simple PCA analysis for a selected time series file to LatentSpace_Prediction_visualisation.ipynb. The PCA analysis includes the original temperature field and the predicted temperature field (the output-feed-as-input looping one). Y-axis represents the eigenvalues and X-axis represents the timestamps. (I haven't tried to get the best and the worst result from all the time series files yet, but I will do this later)

Also, as discussed on the meeting yesterday, I've added the animation of another way of prediction to the GIF, apart from the output-feed-as-input looping one. This one applies prediction to each of the original temperature fields so that there's no output-feed-as-input involved (only feed input to get output). As expected, this method works way better than the previous one, which is consistent with the discussion on the meeting: maybe ML is not designed to handle output-feed-as-input loop in the beginning.

amartinhuertas commented 10 months ago

@GiteonCaulfied thanks for your work!

For @rhyshawkins and @sghelichkhani consideration, the new animated gif that @GiteonCaulfied is mentioning is available here:

https://github.com/GiteonCaulfied/COMP4560_stokes_ml_project/blob/main/actual_and_Predictions.gif

amartinhuertas commented 10 months ago

I've added a simple PCA analysis for a selected time series file to LatentSpace_Prediction_visualisation.ipynb. The PCA analysis includes the original temperature field and the predicted temperature field (the output-feed-as-input looping one). Y-axis represents the eigenvalues and X-axis represents the timestamps. (I haven't tried to get the best and the worst result from all the time series files yet, but I will do this later)

Could it be possible to also add the PCA analysis of the compressed/decompressed fields? Just to see where it sits relatively to the other two.

GiteonCaulfied commented 10 months ago

Could it be possible to also add the PCA analysis of the compressed/decompressed fields? Just to see where it sits relatively to the other two.

@amartinhuertas Sure thing! I just added compressed-then-decompressed field and the single prediction field to the PCA analysis.

amartinhuertas commented 10 months ago

@amartinhuertas Sure thing! I just added compressed-then-decompressed field and the single prediction field to the PCA analysis.

Great, thanks! Copying & Pasting it here for @rhyshawkins and @sghelichkhani convenience ...

Screenshot from 2023-09-08 13-38-37

amartinhuertas commented 10 months ago

(I haven't tried to get the best and the worst result from all the time series files yet, but I will do this later)

Ok. Good call.

GiteonCaulfied commented 10 months ago

Hi @amartinhuertas @rhyshawkins @sghelichkhani

I've now added PCA analysis and GIF animations for the best and worst result. The loss value for defining a best result or a worst result is based on the sum of L1 difference between each single prediction and the actual temperature field for a complete time cycle.

PCA analysis are still in the file LatentSpace_Prediction_visualisation.ipynb and the best/worst animations are available here:

https://github.com/GiteonCaulfied/COMP4560_stokes_ml_project/blob/main/actual_and_Predictions_Best.gif https://github.com/GiteonCaulfied/COMP4560_stokes_ml_project/blob/main/actual_and_Predictions_Worst.gif

amartinhuertas commented 10 months ago

I've now added PCA analysis and GIF animations for the best and worst result.

Ok, thanks for that. Clearly, the worst case result is far from acceptable. It got the transient dynamics completely wrong.

As we talked during the meeting: 1) Let us try to exploit RNNs+LSTM to be able to predict the whole sequence in one shot. 2) Let us try to reduce the complexity of the simulation by sticking into fixed-time step and/or time-stamped data sets, so that we can use the time as input to the ML systems during training/prediction.

You can start with 1) straight away while @sghelichkhani explores the generation of the data set.

amartinhuertas commented 10 months ago

In regards to 1., I think the following architecture is the one that can help here: https://dl.acm.org/doi/10.5555/2969239.2969329

amartinhuertas commented 10 months ago

https://arxiv.org/pdf/1506.04214.pdf

GiteonCaulfied commented 10 months ago

Hi @amartinhuertas

In regards to 1., I think the following architecture is the one that can help here: https://dl.acm.org/doi/10.5555/2969239.2969329

For the ConvLSTM structure described in this paper, PyTorch hasn't implemented this yet so I have to either implement it myself or use others' implementation, which could be found on Github. In this case, I can't really decide which one to choose since implement it from scratch can take some time and I am not sure if I am allowed to use others' implementation for this project.

In this case, I decide to go for another option (at least for now): Using the regular LSTM. I've implemented a basic model using LSTM. The input of the model is a sequence of the first 50 temperature fields in a time series file (compressed by the ConvAE I've implemented before), while the predicted output is the rest of the 50 temperature fields in the same time series file. In short, this model is about using the first half of the temperature fields to predict the second half.

I haven't tried to use less input to predict more output (e.g. use 20 temperature fields to predict the rest 80). If I am about to do that, I have to make the input somehow repetitive since the sequence length of the input and output in a LSTM should be the same.

The result is visualised in the notebook LSTM_visualisation.ipynb, I've also made another two GIFs (actual_and_Predictions_Best_LSTM.gif and actual_and_Predictions_Worst_LSTM.gif) to visualise the difference between the prediction and the actual one. (Only the last 50 temperature fields in a time series is included in the GIFs since I use the first half as the input) You can see that even in the worst case the predicted result is able to capture a general "trend" of the flow. However, the loss value is higher than the fully connected one and more details are lost when using LSTM.

I will upload the files shortly after this comment. Also, the model file for LSTM is now larger than 1GB and Git LFS only provides 2GB file storage for my account. Therefore I will store it in my local laptop for now.

amartinhuertas commented 10 months ago

Hi @GiteonCaulfied,

thanks for your work!

For the ConvLSTM structure described in this paper, PyTorch hasn't implemented this yet so I have to either implement it myself or use others' implementation, which could be found on Github. In this case, I can't really decide which one to choose since implement it from scratch can take some time and I am not sure if I am allowed to use others' implementation for this project.

Fair enough. Let us talk about this in our next meeting (I will try to send some emails to set the date later today).

since the sequence length of the input and output in a LSTM should be the same.

For my understanding, is this a limitation of PyTorch or of RNN-LSTMs as a ML architecture by construction?

You can see that even in the worst case the predicted result is able to capture a general "trend" of the flow. However, the loss value is higher than the fully connected one and more details are lost when using LSTM.

Did you also compute the PCA-SVD? It would also be good how do LSTMs compare in terms of this metric.

On another note, I want to raise a concern that I have w.r.t. the current data set. How do we know that a dataset with 100 time series with 100 adaptive time steps each is a sufficiently rich data set for training? If am not wrong, @sghelichkhani was also concerned in our meeting about the potential scarceness of the data set. Not sure at the moment if this is actually a problem or not, just raising the concern that perhaps it might be good to also consider greater deals of data to see how the results we currently have are affected by this.

GiteonCaulfied commented 10 months ago

Hi @amartinhuertas ,

I just added the PCA analysis in the notebook LSTM_visualisation.ipynb

For my understanding, is this a limitation of PyTorch or of RNN-LSTMs as a ML architecture by construction?

I think you are right about the limitation, here's a link to a diagram I found on StackOverflow illustrating how LSTM works in PyTorch:

https://stackoverflow.com/a/48305882