Open davidvenuto opened 1 year ago
[x] Sift through Bernard's code
Particularly look at his code on the MEOFs, and identify the dominant modes. These will give context about the predictor variables in the ML model
Find the daily version of Bernard's data, and begin figuring out how to draw these important variables from this data.
Look through Bernard's remaining code of relevance
[x] Start semi-development of time series model ( Too early not enough understanding)
[x] Possibly perform independent MEOF analysis on new data
[x] Decide on the proper way to implement model (RNN, CNN, LSTM (most likely LSTM according to Dhruv))
[x] Strengthen understanding of LSTM model
[ ] Lay out framework of what is necessary to implement this specific model in a separate git-issue
[x] Go back to Chen-Yuan (2004) and summarize understandings, and how these ideas will transfer over to your project
Essentially, Chen-Yuan describes the identification of the dominant modes (spatial distribution between the variables) between a number of variables. From the leading modes, we can identify what variables are the most "important" from which we can begin creating our own model using ML.
Jianna's job will be to come up with a better way of utilizing dimensionality reduction, which will possibly change the modes and thus the variables. The ideal situation is the one in which me and Jianna are able to combine our projects into one.
[ ] Potentially integrate over to orca computer, maybe not if LEAP continues serving purposes for now (Not necessary yet)
[x] Create Powerpoint for LDEO Research Focus Session
[x] Come back at end of week and review how each of the goals were met, and any complications
[x] Write the goals for next week
This week went relatively well. I think the main gap that may come back to bite me if I don't fortify it is my understanding of the previous work. I have a basic understanding of MEOFs, the markov model, etc, but not enough to explain it, as was highlighted in my research focus session. I think the next research focus session and our meeting on Monday may be a bit of a wake up call, as it will force me to 1. Really start knowing what I need to be working on, and to get working on it, 2. To actually understand the basis for my project and 3. To start the implementation of my beginner model.
The fact that we are almost halfway through the summer is a bit mind boggling to me, as I feel like I haven't made much progress. I think I need to start being more direct in going to Dhruv and Xiaojun with questions and to really make what I want to happen, actually happen.
A lot of the soft goals for this week were met, but development has yet to start. Once again I think is due to a relatively short understanding of the previous material, but I know I can turn that around next week.
First we discussed about the specifics of the Markov Model process, and what it means for my project.
Essentially, a number of variables are input into the MEOFs. From here, the dominant "modes" are found. In this context, modes refer to the spatial distribution or co-variability amongst the variables. Knowing the dominant modes tells me what variables are the most important, and in turn I can use these variables to create my own ML model.
From here, we began discussing where I should start for my research focus session for this Friday (6/23).
Bernard used monthly sea ice concentration data for his project, but this same data is too short for my ML purpoes. I am going to need to find the corresponding daily data, which will give me enough data to at least make a beginner ML model.
Prior to any model making, Xiaojun suggested that I perform my own MEOF analysis on the daily data that I obtain to find the dominant modes and thus the variables I want to use for my model. This may or may not be worth it. After all, if the purpose is to combine with Jianna's project, and I already have Bernard's data as a reference, perhaps he has already found the same things that I would. But I am using daily as opposed to monthly data, so I will likely perform the MEOF to confirm.
Xiaojun also suggested that I get more informed on the suggested LSTM model, but also research any other potential models and discuss them with her and Dhruv. Off the bat, the LSTM makes sense, but I have not done enough research to conclude whether or not this is definitely the right one. However, it may be the best to at least start out with.
[x] Truly understand the work from Chen-Yuan
[x] What is an EOF?
[x] What is hindcasting?
[x] What is cross validation?
[x] What is the basic process behind the markov model?
[x] How are all of these concepts found in Bernards's code?
[x] Understand the cross over from Chen-Yuan to my project
[x] How will MEOF's be applied to the data I'm using?
[x] What is the best ML model to be using?
[x] Write Research Session 2 Powerpoint
[x] Lay out complete, detailed roadmap for entire beginning ML implementation process, and START IT
I have started to develop a simple feedforward NN to use on the SI data.
The data is being taken from the PCs of the MEOF analysis applied by Bernard
The model has been created, but seem to be having some problems
[x] Finish development of first NN on Sea Ice data. Model is already created but needs to be optimized
[x] Once finished with first NN, move on to an RNN, compare results
[ ] Read more about LSTM, start thinking about architecture for specific implementation
[x] Understand what needs to be done AFTER the model is created. What do we now do with the time series we've created?
[x] Enjoy July 4th
This issue will serve as an overview of my goals for this summer, broken down by each week