Data for every year has around 200MB, so analyzing multiple years at once would massively increase the need for computing power and memory. It is important to prepare a smaller dataset for just one of the 37 levels that would be 37 times smaller than original one.
In the existing repository that the climate scientists prepared for us, there already exist a dataset with all the years and averaged levels with just 18MB. Its description, together with an original data source can be found in the Readme.md file there. The author of the Readme.md uses a data manipulation tool "cdo", so it might be an idea to explore that approach as well.
Steps:
[x] Explore (inspect and plot) the data with averaged levels that is available in the existing repository.
[x] Write a code to select a given level from data and combine resulting files into one smaller dataset. Maybe the original data source could help: you can select the interesting level directly and download files for each year separately. Whichever approach you want to choose, please describe shortly your steps in the README.md file in this github repository for reproducibility.
[x] Once the full data is available, and the code to compute similarities, compute similarity analysis on the whole dataset.
Data for every year has around 200MB, so analyzing multiple years at once would massively increase the need for computing power and memory. It is important to prepare a smaller dataset for just one of the 37 levels that would be 37 times smaller than original one.
In the existing repository that the climate scientists prepared for us, there already exist a dataset with all the years and averaged levels with just 18MB. Its description, together with an original data source can be found in the
Readme.md
file there. The author of theReadme.md
uses a data manipulation tool "cdo", so it might be an idea to explore that approach as well.Steps:
[x] Explore (inspect and plot) the data with averaged levels that is available in the existing repository.
[x] Write a code to select a given level from data and combine resulting files into one smaller dataset. Maybe the original data source could help: you can select the interesting level directly and download files for each year separately. Whichever approach you want to choose, please describe shortly your steps in the
README.md
file in this github repository for reproducibility.[x] Once the full data is available, and the code to compute similarities, compute similarity analysis on the whole dataset.