LM-SAL / aiapy

Python library for AIA data analysis
https://aiapy.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Reorganize repository as a proper Python package #4

Closed nabobalis closed 8 months ago

nabobalis commented 5 years ago

In GitLab by @wtbarnes on May 23, 2019, 14:05

Before being released, aiapy needs to be organized into a proper Python package. This will allow it to be easily installed (e.g. via pip or conda) and more easily integrated into user's workflows. See here for an example of how to structure a Python package. Conveniently, SunPy provides a package template so this can be done with very little effort.

This also necessitates reorganizing the code base and pruning code that does not fall in the scope of the package (see #3).

nabobalis commented 5 years ago

In GitLab by @wtbarnes on May 23, 2019, 15:23

I would also strongly suggest not committing notebooks to the core aiapy repository. The primary reason for this is that notebooks are difficult to version control and should be treated similarly to binary files. This was also a recommendation that came out of the first Python in Heliophysics meeting (see point 14 of the Python in Heliophysics Community Standards).

Since we are likely to have many example notebooks, I would recommend moving these to separate example notebook repository. When it comes to using notebooks for documentation purposes, I would recommend something like sphinx-gallery which lets you generate notebooks on the fly. See the Python in Heliophysics gallery as an example.

nabobalis commented 5 years ago

In GitLab by @markcheung on Jun 4, 2019, 24:01

Good idea. I suggest after we have a "core" aiapy repo that has some basic functionality, we will put it out onto GitHub sin jupyter notebooks. Also for all jupyter notebooks, we should have an accompanying .py script.

Regarding the recommendation of not including binary files, what's the standard practice of distribution binary files that are necessary for a module to work?

nabobalis commented 5 years ago

In GitLab by @wtbarnes on Jun 4, 2019, 24:18

Agreed. With Sphinx gallery, the idea is to write .py files which then get rendered as notebooks (.ipynb files). Users can then download either format. This way, we would only develop and update a single file rather than having to keep multiple files in sync.

Regarding your second point, I believe standard practice is not to include binary files if at all possible. If they are really necessary, it is preferable to have a script that downloads them from an external source rather than sticking them in the source tree. The main reason for this is that binary files don't play nice with version control and can really blow up the size of the repo (e.g. this repository which is already many Mb!)

As an example, the SunPy sample data (mostly FITS files) are stored in an AWS S3 bucket and only downloaded when the user requests it.

nabobalis commented 5 years ago

In GitLab by @markcheung on Jun 4, 2019, 24:36

Sphinx gallery seems to make sense.

Re: binary files. I can understand the desire to keep modules slim. This is why we don't have aia response files for different abundances in the AIA SSW package.

Anyway I don't completely agree that binary files don't play nice with version control. You can have a binary file with a basic python routine to load the array (which could be just a little endian dump of float32 or int) so it doesn't depend on external modules. It would be much preferable (smaller file size) to using ASCII files if the data size is, say >~ 1 million data points.

nabobalis commented 5 years ago

In GitLab by @wtbarnes on Jun 21, 2019, 10:45

Closed by !9

nabobalis commented 5 years ago

In GitLab by @wtbarnes on Jun 21, 2019, 10:45

closed