JuliaClimate / CDSAPI.jl

Julia API to the Climate Data Store (a.k.a. CDS)
https://cds.climate.copernicus.eu
MIT License
23 stars 6 forks source link

Add tests for py2ju() and retrieve() #22

Closed LakshyaKhatri closed 4 years ago

LakshyaKhatri commented 4 years ago

fixes #12 I might need a bit of help in improving the tests.

juliohm commented 4 years ago

Thank you @LakshyaKhatri , I will review it now so that you can continue with the improvements.

LakshyaKhatri commented 4 years ago

Hello @juliohm, I'm observing that many data files are available as zip and tar files instead of grib (or something readable). Should I add test dependencies for handling zip files as well?

PS: please don't merge this PR right now, I'm working on more test cases.

juliohm commented 4 years ago

I think the idea would be to test the data inside these ZIP/TAR files @LakshyaKhatri. For TAR files, we have the Tar.jl package as a pure Julia implementation. Similar packages may exist for ZIP files.

LakshyaKhatri commented 4 years ago

Hello @juliohm, I tried reading the data inside the ZIP/TAR files. It contains files like .nc and other extensions. Should I read those files too? (It will increase the test dependencies like NetCDF.jl and other packages)

juliohm commented 4 years ago

Yes, apparently some datasets in CDS are stored in GRIB and some are stored in NetCDF. We could load GRIB.jl in the test dependencies and NCDatasets.jl to load these and write the tests. What do you think?

LakshyaKhatri commented 4 years ago

Okay, I will add these packages to test dependencies and will apply test cases on the content inside the Zip/Tar files (we can remove them later if new ideas come out)

LakshyaKhatri commented 4 years ago

The only problem now is; we have to test that the compressed files are not corrupted, else we will get an error outside the test cases. Something like this:

European energy sector cimate: Error During Test at /home/aries/.julia/dev/CDSAPI/test/retrieve.jl:77
  Got exception outside of a @test
  IOError: mkdir: no such file or directory (ENOENT)
  Stacktrace:
   [1] uv_error at ./libuv.jl:97 [inlined]
   [2] mkdir(::String; mode::UInt16) at ./file.jl:177
   [3] mkdir at ./file.jl:170 [inlined]
   [4] arg_mkdir(::Tar.var"#77#80"{GZipStream,Tar.var"#1#2"}, ::String) at /home/aries/.julia/packages/ArgTools/4vlk9/src/ArgTools.jl:136
   [5] #76 at /home/aries/.julia/packages/Tar/6EM4e/src/Tar.jl:204 [inlined]
   [6] arg_read(::Tar.var"#76#79"{Tar.var"#1#2",String}, ::GZipStream) at /home/aries/.julia/packages/ArgTools/4vlk9/src/ArgTools.jl:43
   [7] extract(::Function, ::GZipStream, ::String; skeleton::Nothing, copy_symlinks::Nothing) at /home/aries/.julia/packages/Tar/6EM4e/src/Tar.jl:203
   [8] #extract#82 at /home/aries/.julia/packages/Tar/6EM4e/src/Tar.jl:225 [inlined]
   [9] extract(::GZipStream, ::String) at /home/aries/.julia/packages/Tar/6EM4e/src/Tar.jl:225
   [10] top-level scope at /home/aries/.julia/dev/CDSAPI/test/retrieve.jl:96
   [11] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
   [12] top-level scope at /home/aries/.julia/dev/CDSAPI/test/retrieve.jl:78
   [13] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
   [14] top-level scope at /home/aries/.julia/dev/CDSAPI/test/retrieve.jl:8
   [15] include(::String) at ./client.jl:439
   [16] macro expansion at /home/aries/.julia/dev/CDSAPI/test/runtests.jl:12 [inlined]
   [17] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [inlined]
   [18] top-level scope at /home/aries/.julia/dev/CDSAPI/test/runtests.jl:11
   [19] include(::String) at ./client.jl:439
   [20] top-level scope at none:6
   [21] eval(::Module, ::Any) at ./boot.jl:331
   [22] exec_options(::Base.JLOptions) at ./client.jl:264
   [23] _start() at ./client.jl:484
LakshyaKhatri commented 4 years ago

cc: @juliohm

juliohm commented 4 years ago

I didn't have a chance to read the package ZipFile.jl and GZip.jl carefully, but aren't they implementing the same functionality? Could we depend on just one of them? Sorry if the question doesn't make sense, I've never played with ZIP files that much.

juliohm commented 4 years ago

I would try GZip.jl for pure ZIP files and and Tar.jl for TAR and TAR.GZ files. I may be incorrect though that these formats can be read by these projects.

LakshyaKhatri commented 4 years ago

Yes, I can understand. I asked the same question to myself while doing this and I tried using individual packages too, but it didn't work. The problem is we are receiving .tar.gz files from CDS. Tar.jl handles only .tar files and GZip handles only .gz files. A download.tar.gz is compressed in two stages. (Why?). So, we have to first decompress the download.tar.gz with GZip.jl to obtain a download.tar file and then use Tar.jl to obtain the original contents inside the tar file.

Also, ZipFile.jl handles only .zip files :(

juliohm commented 4 years ago

Got it, these variety of archive formats in the CDS is a little annoying but I think your PR handles it well. So the corruption issue is due to gaps in internet connection?

LakshyaKhatri commented 4 years ago

So the corruption issue is due to gaps in internet connection?

Yes! (okay I got it, this won't be an issue :laughing: )

LakshyaKhatri commented 4 years ago

@juliohm let me know if we should make more changes to this PR. :smile_cat:

juliohm commented 4 years ago

I think it is great @LakshyaKhatri 🍾 I will merge it and then try to run the tests locally during the day. 👍