Data sets used in animation paper

PaulWessel commented 10 months ago

Hi @Esteban82 and @gd-a-

Want to discuss the data sets used in these animations. I have finished my Emperor density movie and it uses a 28M zip file with a bunch of ascii prisms (for plot3d) and 2 tiny grids. This is not much.

https://github.com/GenericMappingTools/gmt-2024-animation/assets/26473567/50b08a58-8716-4b33-b0ec-fe162437c4ba

I know @federico has a few relatively small files for Indiana Jones and Messi, and maybe something larger for the science animation.

As for @gd-a's animation I think the data access and amount is a bit more complicated but need to get a summary.

My thoughts on how to distribute things are these:

Obviously, the paper is published through Wiley or Elsevier so that is taken care of.
All the figure and animation scripts are released via the gmt-2024-animation repo which we will make public ones paper is accepted.
For reasonable small datasets that are in the public domain (e.g., my own prisms and grids) or Indiana Jones paths etc, that can easily fit in a zip ball and we could publish that on Zenodo. Then, our git scripts and download the zenodo URL, unzip it, and make the figures.
For data sets that are huge and not suitable to join that zenodo I am open to suggestions. We want to avoid the situation where in 5 years some URL we used to get data no longer works. We want this git/paper to work "forever". The paper will cite the zenodo reference for these data as well as the GMT6.5 zenodo tarball.

OK, please respond the best you can and give me a sense of the total number of Mb or Gb that is required.

gd-a commented 10 months ago

For precipitation : 1.94 GB (ecmwf) For the seismicity : 885 KB (ipgp)

For the polar movie : 7.19 GB (ecmwf)

cat << 'EOF' > download.py  
import cdsapi
import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument("number", type=int)
    args = parser.parse_args()

    c = cdsapi.Client()

    c.retrieve(
    'satellite-sea-ice-concentration',
    {
        'origin': 'esa_cci',
        'region': 'northern_hemisphere','southern_hemisphere',
        'cdr_type': 'cdr',
        'year': str(args.number),
        'month': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
        ],
        'day': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
            '13', '14', '15',
            '16', '17', '18',
            '19', '20', '21',
            '22', '23', '24',
            '25', '26', '27',
            '28', '29', '30',
            '31',
        ],
        'version': 'v2',
        'variable': 'all',
        'format': 'zip',
    },
    "download_" + str(args.number) + ".zip")
EOF

mkdir ./DOWNLOAD
cd DOWNLOAD
for yr in {2013..2016}; 
do
    python download.py $yr
done

# [Extract] the archives and [sort & rename] the files
mkdir ../north
mkdir ../south

# # [extract]
for ff in $(ls -v *zip)
do
    unzip $ff 
done

# # [sort & rename] : north
for nh in $(ls -v *NH*);
do
    mv $nh ../north/$(echo $nh | awk -F[-] '{printf "north_%s.nc",$7}')
done

# # [sort & rename] : south
for sh in $(ls -v *SH*);
do
    mv $sh ../south/$(echo $sh | awk -F[-] '{printf "south_%s.nc",$7}')
done

# GMT TIME !
cd ..  

gmt movie main.sh -Iinclude.sh -Sbpre.sh -Sfpost.sh -Npoles_mov -C1080p -Tlist.txt -D24 -Fmp4 -Pb+w1c+jMC+o0c/3.25c

PaulWessel commented 10 months ago

Trying to just create and run your download.py, but from the DOWNLOAD directory but your scrip does not place it there... Anyway, as dumb Python user I get lots of errors and get nowhere: Some advice?

 bash -xv setupjob.sh
cat << 'EOF' > download.py  
import cdsapi
import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument("number", type=int)
    args = parser.parse_args()

    c = cdsapi.Client()

    c.retrieve(
    'satellite-sea-ice-concentration',
    {
        'origin': 'esa_cci',
        'region': 'northern_hemisphere','southern_hemisphere',
        'cdr_type': 'cdr',
        'year': str(args.number),
        'month': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
        ],
        'day': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
            '13', '14', '15',
            '16', '17', '18',
            '19', '20', '21',
            '22', '23', '24',
            '25', '26', '27',
            '28', '29', '30',
            '31',
        ],
        'version': 'v2',
        'variable': 'all',
        'format': 'zip',
    },
    "download_" + str(args.number) + ".zip")
EOF
+ cat

mkdir -p ./DOWNLOAD
+ mkdir -p ./DOWNLOAD
mv download.py DOWNLOAD
+ mv download.py DOWNLOAD
cd DOWNLOAD
+ cd DOWNLOAD
for yr in {2013..2016}; 
do
    python download.py $yr
done
+ for yr in {2013..2016}
+ python download.py 2013
  File "/Users/pwessel/DOWNLOAD/download.py", line 4
    parser = argparse.ArgumentParser()
IndentationError: unexpected indent
+ for yr in {2013..2016}
+ python download.py 2014
  File "/Users/pwessel/DOWNLOAD/download.py", line 4
    parser = argparse.ArgumentParser()
IndentationError: unexpected indent
+ for yr in {2013..2016}
+ python download.py 2015
  File "/Users/pwessel/DOWNLOAD/download.py", line 4
    parser = argparse.ArgumentParser()
IndentationError: unexpected indent
+ for yr in {2013..2016}
+ python download.py 2016
  File "/Users/pwessel/DOWNLOAD/download.py", line 4
    parser = argparse.ArgumentParser()
IndentationError: unexpected indent

# [Extract] the archives and [sort & rename] the files
mkdir -p ../north
+ mkdir -p ../north
mkdir -p ../south
+ mkdir -p ../south

# # [extract]
for ff in $(ls -v *zip)
do
    unzip $ff 
done
++ ls -v '*zip'
ls: *zip: No such file or directory

# # [sort & rename] : north
for nh in $(ls -v *NH*);
do
    mv $nh ../north/$(echo $nh | awk -F[-] '{printf "north_%s.nc",$7}')
done
++ ls -v '*NH*'
ls: *NH*: No such file or directory

# # [sort & rename] : south
for sh in $(ls -v *SH*);
do
    mv $sh ../south/$(echo $sh | awk -F[-] '{printf "south_%s.nc",$7}')
done
++ ls -v '*SH*'
ls: *SH*: No such file or directory

# GMT TIME !
cd ..  
+ cd ..

#gmt movie main.sh -Iinclude.sh -Sbpre.sh -Sfpost.sh -Npoles_mov -C1080p -Tlist.txt -D24 -Fmp4 -Pb+w1c+jMC+o0c/3.25c
(base) pwessel@MacAttack-2->

gd-a commented 10 months ago

I hate python so I'm of very little help to debug. But just in case, do you have the cds token ? It's a key in ~/.cdsapirc (or wherever you path it) to verify your identity on Copernicus.

but your scrip does not place it there...

The script creates the dir (mkdir), move the cwd there (cd), then download in place (for loop)... no ?

Esteban82 commented 10 months ago

I know @federico has a few relatively small files for Indiana Jones and Messi, and maybe something larger for the science animation.

Yes, my datafile are very small.

The idea of my science animation is to use the remote data sets directly (@earth_faa). So, this could change with time (the url can broke in the future, or we could replace the file for a newer version). Is it ok to use the remote datasets?

Esteban82 commented 10 months ago

I think that is similar to the @moon_relief. So we could use the remote data sets directly.

gd-a commented 8 months ago

Just a thought: It would be best to concatenate all the grids into a single file then iterate on it. It would lighten the script preparation by a line or two too ...

Esteban82 commented 8 months ago

Just a thought: It would be best to concatenate all the grids into a single file then iterate on it.

How would that work?

I think we could do it.

gd-a commented 8 months ago

Just preprocess the grids (they already are after all). Then instead of iterating through the names (yyyymmdd.grd) you’d iterate through the 3rd dimension ($file?$MOVIE_COL0) ?

Esteban82 commented 8 months ago

Ok, so now we have only one grid? Great. Could you send it to me? Or add it in the repo if it less than 100 MB.

gd-a commented 8 months ago

The total file is about 600MB... I think my script to append grids together is not very optimised for compression. Is it worth it ?

Esteban82 commented 8 months ago

Weird. All the grids are about 350 MB. So, no.

GenericMappingTools / gmt-2024-animation

Data sets used in animation paper #21