Closed malmans2 closed 3 years ago
Thanks for requesting this...there have been some significant backend improvements to astropy
tables since I last compared these packages, and I have to walk my claim back to 10x rather than 10-100x. I have revised relevant text to reflect the current situation and avoid unfairly maligning astropy
tables. (Also, while benchmarking, I found a couple of parser bugs my tests were not adequately covering, so, great.)
I've added some comparison benchmarks in the 'benchmark' directory -- they of course require bringing in additional dependencies, so I haven't added them to the main examples directory. Also, the use cases where the performance improvements are important involve pretty big chunks of geometry data, Horizons' serverside response time is so variable that you cannot do apples-to-apples comparisons with live queries, and I didn't want to add an additional 100 MB to the repository size, so I've stashed the sample data needed for these benchmarks on Google Drive. (Link is in the benchmark Notebook.)
Also, I would like to make sure that it is clear that, for many users, the reasons to use lhorizon
over jplhorizons
would be functionality and interface rather than performance. (And conversely, users who don't need lhorizon
's features and want tight integration with the astropy
ecosystem would probably prefer to use jplhorizons
.) I've tried to make these points a little clearer in the paper.
Love this! Unfortunately the notebook still fails in binder (at least for me, astropy seems to be missing)
Also, I would suggest to add some assert
checks among calls results, to make sure/prove that on top of a better timing, one also gets the same results.
Sorry about the delay, I was on leave. I have a couple of suggestions:
memory-profiler
~ should be memory_profiler
https://github.com/MillionConcepts/lhorizon/blob/eee7be4ee9e14f72de643bd690163cb0640a657c/benchmark/bench-environment.yml#L16jupyter
in the environment, as it is consistent with the environment file provided in the root directoryLove this! Unfortunately the notebook still fails in binder (at least for me, astropy seems to be missing)
Sorry, yes, I hadn't even intended these to work in Binder, because they use (by the standards of the Binder backend) large files and a lot of RAM, and I think it will be difficult to make them consistently not crash Binder. Do you think that adding local installation instructions as @malmans2 suggested is adequate?
Also, I would suggest to add some
assert
checks among calls results, to make sure/prove that on top of a better timing, one also gets the same results.
Good idea. Added in https://github.com/MillionConcepts/lhorizon/commit/810afb36c83e4137fd4844a3b9d87d8addff2b2f
(I'm not 100% sure, but I think in this way GitHub will show the README in the benchmark directory without having to click on it)
This appears to have worked!
Use step by step instructions in the readme, including terminal commands such as how to grab the files (e.g., wget/curl), how to install the benchmark environment, and how to install lhorizon in the benchmark environment
Added.
Consider storing the benchmark files in a different repository/cloud (I believe they are currently in a google drive). E.g., you could use [git-lfs](https://git-lfs.github.com/) and store the file in this GitHub repo, or you could use zenodo.
Ok, I will think about this. Do you mean that you think that the files should have their own DOI? Or instead that it is awkward, or might be overly fragile, to distribute them via Google Drive in this way? git-lfs is basically just an automated aliasing system for external storage locations, right? So this would involve, for instance, setting up an S3 bucket and pointing the git-lfs server at credentials for that bucket (+ some addressing scheme for that bucket) in order to create versioned pointers from this repository to that bucket?
I think there is a typo in the environment file. ~`memory-profiler`~ should be `memory_profiler` https://github.com/MillionConcepts/lhorizon/blob/eee7be4ee9e14f72de643bd690163cb0640a657c/benchmark/bench-environment.yml#L16
Yes, that's a typo.
I would add
jupyter
in the environment, as it is consistent with the environment file provided in the root directory
Added.
Although the variable names are self-describing, I would make more obvious which cell is benchmarking lhorizon vs jplhorizon (e.g., add markdowns, comments, or prints)
extra markdowns & comments added in 0012dfe
I'd personally prefer zenodo because of the DOI or git-lfs because it seems easier to maintain/more robust. But nothing wrong with google drive, so I'll leave it up to you. Regarding git-lfs, you just have to install it, and then set which files to manage through Git LFS. No need to set up an external bucket. See Getting Started.
Final minor comments:
In point 1: Typo ~Them~ should be Then
In point 2: If you assume that the user is in the root directory the yml path should be benchmark/bench-environment.yml
. But maybe it's easier to run all commands from /benchmark
? This way there's no need to move the unzipped data in point 1, and no need to change directory later in point 4.
I'd also mention the benchmark directory in the main readme, so users know about it.
I always have trouble believing things like git-lfs are free...in any case, I switched over to Zenodo, because using git-lfs will require users to install it to successfully clone the repository, and it seems easier to simply instruct them to download the files. I have also addressed your other comments. All changes are present by 89b1b3c . Leaving this issue open pending response from @steo85it on Binder.
These solutions works for me, thanks. Just a minor issue (not sure if this is somehow related to my setup): when running the timing comparison cells, I need the %%time
magic command to be alone on a line of code, even the (currently present) comment causes the cell run to fail with a
UsageError: Can't use statement directly after '%%time'!
error. Simply sending the comment to the following line (or removing it) solves this.
These solutions works for me, thanks. great!
error. Simply sending the comment to the following line (or removing it) solves this. oh, sorry, that's a bad copy-paste or something. fixed in 4c982dd
I would add in the documentation a notebook that illustrates the advantages of using
lhorizon
rather thanastroquery
and supports the performance improvements discussed in the paper (line 27-43).https://github.com/openjournals/joss-reviews/issues/3495