Closed natgeo-wong closed 3 years ago
Thank you @natgeo-wong for the nice refactoring and logging. Could you please update the tests accordingly so that we have a clearer picture of the proposed API?
Appreciate if you can review the PR @LakshyaKhatri. It is also an opportunity to figure out if that issue about decryption is still present.
Hello @juliohm! Sorry for the delayed response. I think we should test these changes locally before merging them. Also, since this is an external pull request, Travis is turning the environment variables off to prevent modification of the encrypted files and that is why the tests are failing.
Also, @natgeo-wong, thank you so much for these beautiful modifications. The code looks neat.
I will first test the current changes locally and then we will be ready to merge the changes. After that, I will check what can be done on the Travis part to resolve the issue.
I revamped the set of tests, because the output is quite different. You could add your own tests to this as well?
The current version of cdskeys()
is showing timestamps like this:
IMO, having a timestamp won't be that much of a help. Also, for showing a timestamp we are adding an additional dependency of Dates.jl
(which could be overkill). @juliohm, it would be helpful to have your views on this.
Thanks for pinging @LakshyaKhatri , I think the dependency on Dates is fine as it is a stdlib shipped with Julia. My main concern is actually the verbosity of these messages. It is nice to have logging features, but we should consider adding a verbose
option to do things quietly if possible. This option could default to true
. What do you think?
The verbosity of these messages are very similar to the CDSAPI by Python, and actually I like them because I can monitor the progress of the downloads.
When I was using the initial version of CDSAPI, my datasets are extremely large and take a long time to download, so the CDSAPI script kept throwing "Request is running" every second, which I believe is even more verbose for large datasets.
Reason why I added the log for the cdskeys() is so that people are reminded that they shouldn't be continuously calling the cdskeys() function in a loop when once will do.
Also, I should add that when I have more time I'm going to try and add a progress bar option to HTTP.jl for downloads, which would be cool.
My main concern is actually the verbosity of these messages. It is nice to have logging features, but we should consider adding a
verbose
option to do things quietly if possible. This option could default totrue
. What do you think?
@juliohm could you provide some more information. I am still linking this with the timestamps that are added by @natgeo-wong in the info logs!
Also, I should add that when I have more time I'm going to try and add a progress bar option to HTTP.jl for downloads, which would be cool.
That would be very cool indeed @natgeo-wong :100:
Also, I still think we should consider adding a verbose option to the download command. Sometimes users are not interested in verbosity, for example when running multiple processes in a computer cluster. I had similar issue with the Pkg manager recently and they are considering adding a verbose option there as well. Imagine multiple processes downloading different datasets.
We could just follow a pattern verbose && @info "foo"
throughout the code. Shouldn't be that hard to add this option.
Regarding the dependency on Crayons, do we really want to add an extra dependency just for a boldface?
We could just follow a pattern
verbose && @info "foo"
throughout the code. Shouldn't be that hard to add this option.
I have done this in the most recent push
Regarding the dependency on Crayons, do we really want to add an extra dependency just for a boldface?
I could remove it haha, I just like nice formatting.
Would it be ok to remove the Crayons dependency? Given that the package is very small, and that we are not using formatting all over the place, I think it would be better to remove it. I didn't check the loading time, but the less dependencies the better in the long-term.
@natgeo-wong did you have a chance to take a look into this? It would be nice to update the PR and merge the improvements.
Hi @juliohm and @LakshyaKhatri, sorry I've just been pretty busy these past few days, I'll go through and have a look!
Done!
Awesome @natgeo-wong , do we also need an update to the README? I will try test the PR locally tomorrow :+1:
An update to the README would be good. I'll work on the README over this coming week as a way to soothe myself when everything starts getting extremely crazy on Tuesday.
Hey @natgeo-wong , do you think you could manage a final stretch on this PR this week? I know it is been a crazy week for you. :)
Looking forward to review the changes.
This week is good! Expect updates today or tomorrow! Just to confirm, the tests are passing for both you and Lakshya?
Done!
I've just cloned the repo and tried the changes locally, and unfortunately I don't think we can merge the PR as is. There are many issues including the fact that tests with the ERA5 in GRIB format were erased, NetCDF was replaced with NCDatasets without a corresponding update in the Project.toml, and other source code formatting issues. Overall, it is more difficult to understand what the code is doing now compared to the master branch.
Taking a step back, this PR is basically adding a "verbose" feature, i.e. more control over the messages printed to the terminal. If you would like to start a fresh PR to add this feature, we'd be happy to review it.
I am sorry about the tests. I think I mentioned early on that I am not too sure how Testing works in Julia and I'm still trying to figure it out, because I do not use it regularly.
This PR is more than just "verbosity". When I first looked at CDSAPI.jl, I was very uncertain as to how to use it because of py2ju. I've gotten rid of that, because the crux of CDSAPI.jl should be on Julia, and not on "conversion" measures from Python to Julia. I didn't even know what the nature of the inputs were until I had a very close look at the code and realized that what py2Ju was doing was simply returning a dictionary. So I didn't even know what inputs I was supposed to be using, and that is offputting to people who are still trying out the basics of Julia.
I would argue that when one comes from Julia, one finds the master branch of CDSAPI harder to absorb because of conflicting information and priorities aimed to "converting" a python code or dictionary or information into Julia, when one should simply be using Julia code directly. I edited the code to be able to do that, specifically by restricting the types of the input argument, which should already be giving people familiar with Julia a sense of what sort of input arguments are required.
I've also pulled out the retrieval of the cdskeys as a function on its own, so people don't need to keep rereading the cdskeys each time when we use loops to retrieve large datasets (over different months/years, for example).
I can probably restore some of the GRIB tests, etc. from the master branch. I don't use those formats, so I am predisposed to rely only on NCDatasets, but I will do my best.
My key message is this: We really need to remove the dependency of py2ju, because that is just confusing.
TLDR, what I did was:
TLDR, what I did was:
- Restrict the nature of input arguments to make it clearer for Julia users
Restricting input arguments is a Julia anti-pattern, see section "Over-constraining argument types" https://www.oxinabox.net/2020/04/19/Julia-Antipatterns.html
- Remove py2ju because we really should not be relying on python syntax in Julia code. Instead, show users how to input the arguments in Julia code.
Isn't it true that the current master branch supports Julia dicts just fine? We are only providing an additional helper function py2ju
in case someone doesn't want to build the dict by hand. The CDS website creates Python queries for you, and converting these queries by hand can be a waste of time. What we should do is clarify in the README that Julia dicts are supported.
- Pulling out minor functions (see below), so that they can be individually modified without disturbing the overall code. a. Retrieval of cdskeys b. Parsing of HTTP request
To be honest I don't see much use in a separate cdskey function, but we can review a separate PR with this improvement. Regarding the parsing of HTTP, this is the part of the code that I thought could be reformatted. What about we start two separate PRs with these changes?
Hi, sorry for the silence.
I will make the PRs when I have time, now is crunch time with class projects coming up.
I still believe that specifying the input types will be helpful, especially if people are coming over from Python to Julia, for new users to understand exactly how the Julia portions of the code work. I stand by my previous statement that specifying it is necessary.
I will make separate PRs for the parts you've specified.
Awesome. I will close this PR for now, and then we can start fresh PRs with individual changes when you are over your other tasks in your todo list.
retrieve()
to mimic the Python CDSAPI retrieval logs