Opportunity: reduce the download size by 48.2 MB

texadactyl commented 4 years ago

Blimpy references the same Voyager test file, Voyager1.single_coarse.fine_res.h5, using a URL link in README.md to http://blpd0.ssl.berkeley.edu/Voyager_data/

I suggest removing folder turbo_seti/voyager_test/ and following the lead of blimpy.

telegraphic commented 4 years ago

Good idea, marking as enhancement.

Notes: The blimpy repo was setup to build using docker, the download_data.sh script is run in the Dockerfile to grab data so the tests can run.

The docker stuff in blimpy likely needs a refresh -- I'm not sure the KERN distribution is maintained, and there seems to be a bit of a push in the pulsar community to use conda, which we might want to consider following. We'll also need to prune the commit history (we used BFG Repo-Cleaner for blimpy)

texadactyl commented 4 years ago

Okay, now you have challenged me! I'll be back (with a generic downloader in Python).

texadactyl commented 4 years ago

I agree with the logic for engaging with Anaconda. It is reasonably good at keeping scientists out of trouble. Note: It is not recommended to mix pip3 commands with conda operations - it can make a mess of an Anaconda installation (been there, done that, although I knew how to repair it - ugh).

Attached is a ZIP file (voyafetch.zip) containing 2 Python files:

voyafetch.py - Procedure download_file() is callable from other Python programs. It also contains a stand-alone main program to download Voyager1.single_coarse.fine_res.h5.
test_voyafetch.py - sample caller

I tried both with a couple of files out at http://blpd0.ssl.berkeley.edu/Voyager_data/ I was able to download Voyager1.single_coarse.fine_res.h5 into /tmp in ~24 seconds.

This could be a companion utility for either blimpy or turbo_seti or both.

It would be helpful to host all of the downloadable test files for all projects in a tree, rooted at a single UCBerkeleySETI URL. They could be organized by mission (E.g. Voyager, Hubble) or any other mechanism that made sense. Then, folks on all projects could go to one test data repository to look for interesting data. This would eliminate duplication now and into the future.

I would recommend that the test data repository include a catalogue where data sets could be described and their various file extensions explained.

telegraphic commented 4 years ago

Hi texadactyl -- I like this approach. @mattlebofsky, assigning you to make a note that we should put together a webpage with basic test data for blimpy/turboseti/etc.

telegraphic commented 4 years ago

Small update on this front: there's a download_data method in astropy.utils.data that will download data to a cache directory (default is ~/.astropy).

texadactyl commented 3 years ago

Fixed in version 1.3.2.

UCBerkeleySETI / turbo_seti

Opportunity: reduce the download size by 48.2 MB #32