Closed LakshyaKhatri closed 4 years ago
Thank you @LakshyaKhatri , I will review it now so that you can continue with the improvements.
Hello @juliohm, I'm observing that many data files are available as zip
and tar
files instead of grib
(or something readable). Should I add test dependencies for handling zip
files as well?
PS: please don't merge this PR right now, I'm working on more test cases.
I think the idea would be to test the data inside these ZIP/TAR files @LakshyaKhatri. For TAR files, we have the Tar.jl package as a pure Julia implementation. Similar packages may exist for ZIP files.
Hello @juliohm, I tried reading the data inside the ZIP/TAR files. It contains files like .nc
and other extensions. Should I read those files too? (It will increase the test dependencies like NetCDF.jl
and other packages)
Yes, apparently some datasets in CDS are stored in GRIB and some are stored in NetCDF. We could load GRIB.jl in the test dependencies and NCDatasets.jl to load these and write the tests. What do you think?
Okay, I will add these packages to test dependencies and will apply test cases on the content inside the Zip/Tar
files (we can remove them later if new ideas come out)
The only problem now is; we have to test that the compressed files are not corrupted, else we will get an error outside the test cases. Something like this:
European energy sector cimate: Error During Test at /home/aries/.julia/dev/CDSAPI/test/retrieve.jl:77
Got exception outside of a @test
IOError: mkdir: no such file or directory (ENOENT)
Stacktrace:
[1] uv_error at ./libuv.jl:97 [inlined]
[2] mkdir(::String; mode::UInt16) at ./file.jl:177
[3] mkdir at ./file.jl:170 [inlined]
[4] arg_mkdir(::Tar.var"#77#80"{GZipStream,Tar.var"#1#2"}, ::String) at /home/aries/.julia/packages/ArgTools/4vlk9/src/ArgTools.jl:136
[5] #76 at /home/aries/.julia/packages/Tar/6EM4e/src/Tar.jl:204 [inlined]
[6] arg_read(::Tar.var"#76#79"{Tar.var"#1#2",String}, ::GZipStream) at /home/aries/.julia/packages/ArgTools/4vlk9/src/ArgTools.jl:43
[7] extract(::Function, ::GZipStream, ::String; skeleton::Nothing, copy_symlinks::Nothing) at /home/aries/.julia/packages/Tar/6EM4e/src/Tar.jl:203
[8] #extract#82 at /home/aries/.julia/packages/Tar/6EM4e/src/Tar.jl:225 [inlined]
[9] extract(::GZipStream, ::String) at /home/aries/.julia/packages/Tar/6EM4e/src/Tar.jl:225
[10] top-level scope at /home/aries/.julia/dev/CDSAPI/test/retrieve.jl:96
[11] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[12] top-level scope at /home/aries/.julia/dev/CDSAPI/test/retrieve.jl:78
[13] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[14] top-level scope at /home/aries/.julia/dev/CDSAPI/test/retrieve.jl:8
[15] include(::String) at ./client.jl:439
[16] macro expansion at /home/aries/.julia/dev/CDSAPI/test/runtests.jl:12 [inlined]
[17] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [inlined]
[18] top-level scope at /home/aries/.julia/dev/CDSAPI/test/runtests.jl:11
[19] include(::String) at ./client.jl:439
[20] top-level scope at none:6
[21] eval(::Module, ::Any) at ./boot.jl:331
[22] exec_options(::Base.JLOptions) at ./client.jl:264
[23] _start() at ./client.jl:484
cc: @juliohm
I didn't have a chance to read the package ZipFile.jl and GZip.jl carefully, but aren't they implementing the same functionality? Could we depend on just one of them? Sorry if the question doesn't make sense, I've never played with ZIP files that much.
I would try GZip.jl for pure ZIP files and and Tar.jl for TAR and TAR.GZ files. I may be incorrect though that these formats can be read by these projects.
Yes, I can understand. I asked the same question to myself while doing this and I tried using individual packages too, but it didn't work. The problem is we are receiving .tar.gz
files from CDS.
Tar.jl handles only .tar
files and GZip handles only .gz
files.
A download.tar.gz
is compressed in two stages. (Why?). So, we have to first decompress the download.tar.gz
with GZip.jl to obtain a download.tar
file and then use Tar.jl to obtain the original contents inside the tar file.
Also, ZipFile.jl handles only .zip
files :(
Got it, these variety of archive formats in the CDS is a little annoying but I think your PR handles it well. So the corruption issue is due to gaps in internet connection?
So the corruption issue is due to gaps in internet connection?
Yes! (okay I got it, this won't be an issue :laughing: )
@juliohm let me know if we should make more changes to this PR. :smile_cat:
I think it is great @LakshyaKhatri 🍾 I will merge it and then try to run the tests locally during the day. 👍
fixes #12 I might need a bit of help in improving the tests.