armstonj / gedipy

Tools for processing of NASA's Global Ecosystem Dynamics Investigation (GEDI) HDF-5 data products.
BSD 3-Clause "New" or "Revised" License
5 stars 4 forks source link

[BUG] naming conflict on pipy #7

Open slumnitz opened 4 years ago

slumnitz commented 4 years ago

As a note: The name gedipy seems to already be taken on pipy.

This means, in case this project will develop into a package we would need to distribute it under another name, or kindly ask the other team if it would be possible to get the name from them.

Distributing it via pip install gedipy is currently not an option.

armstonj commented 4 years ago

Ok thanks, I had not noticed that. Their package is named pygedi, so we could see if they are ok to release 'gedipy'. Given I've been expanding this for ICESat-2 and LVIS as well as GEDI, perhaps a rename at our end might be more inclusive. Any suggestions? nasalidarpy? I should open this up soon, or merge it into something else.

slumnitz commented 4 years ago

I agree that because the code includes also LVIS and ICESat-2 functionality a more generic name would be fitting. I'll have a think about names as well, I did not have a revelation yet. Maybe looking at and also defining the scope of the project helps finding a name or deciding to merge it with something else?

What I have seen so far is basically i/o into common python formats (arrays, pandas.Dataframes, dask etc) to further process/ analyze the data. There is a package called Satpy which has a similar scope.

Where would you like to draw the line in terms of functionality? How much actual analysis functionality of GEDI or other data would you like to include? Is there a specific community you would like to serve (i.e. the biomass/maap community)? Just to get a rough idea.

slumnitz commented 4 years ago

I just heard the suggestion PyLid (leaning on pilot) and going towards something like nasaPyLid or spacePyLid (leaning on Lidar form Space?)

armstonj commented 4 years ago

If I merge into another package it would probably be pylidar (another package I co-developed that gedipy is heavily influenced by, and is designed for lidar in general) but I need to think more if I really want to go there with the dask/xarray functionality.

The focus on NASA instruments is useful. Quite specific GEDI/ICESat-2 knowledge is being built into gedipy that is probably not easily available elsewhere, and ultimately the processing code for the GEDI Level 2B and Level 4A datasets could land here too. Presently the user communities are:

slumnitz commented 4 years ago

Thank you for sharing! I think the plan of creating a space that collects functionality to work with different NASA Lidar instruments is great!

I'd like to share what is happening on the ESA BIOMASS side currently. I think there are parallels and maybe we can use to our advantage that packages develop simultaneously, i.e. sharing knowledge, resources and potentially similar structures for user friendliness. At ESA, we are currently creating an open source project for the BIOMASS prototype processor called BioPAL (BIOMASS Product Algorithm Laboratory). We basically would like to test what open source structures could work for the operational processor, i.e. all BIOMASS L2 algorithms in one repo/ package, or are separate packages more maintainable/ useful? Give access to functionality of the processor in Jupyter as well as within the command line? How to maintain both solutions, how to distribute functionality, etc? Basically we are testing the waters how to best structure and distribute the operational processor. Currently, we are heavily leaning on project structures found in successful OS projects i.e. in Project Jupyter or numpy. I am happy to provide some templates and suggestions for CoC, governance and Contributing guidelines that I developed within the scope of BioPAL.

It has proven valuable for now, to have an organization called BioPAL (Biomass Product Algorithm Laboratory) including a repo BioPAL which currently hosts the source code for the L2 AGB prototype processor. For some inspiration thinking about structuring a project, I like the approach using organizations with multiple repos. At PySAL (the Python Spatial Analaysis Library) for example we have one meta package (basically an umbrealla repo, that is designed to combine and distribute multiple smaller subpackages together as one big metapackage if the user needs this). There is a general trend in open source, to work towards smaller packages, as maintenance work can be distributed more easily, i.e separation of scikits from scipy. Thinking about the gedipy project being in the alpha phase with potentially loads of add on functionality, maybe a similar structure, i.e. one organization with one main repo, either including all source code or pulling from different smaller repos could be an option that keeps maintaining and developing gedipy flexible? That way you can collect all functionality in one space and keep your options open how to structure it for now?

I also think it would be worth investing some time in defining modules and the structure of the code base and documenting your thoughts early. i.e. trying to answer the question whether to structure the code base (structure = single .py files or module names, i.e. gedipy.io; gedipy.explore, ..) via functionality, i.e I/O, exploratory data analysis, operational processors, download, visualization; or/ and to structure it via instrument, i.e. gedipy.gedi.io, gedipy.nisar.explore). In case refactoring is necessary I'd be happy to help out!

By the way I really like the name SpacePyLid by now, refering to Space LiDAR and Python maybe for your organization and first repo/ package?