Open Thomas-Moore-Creative opened 1 month ago
Hola @Thomas-Moore-Creative,
This sounds cool. I'm thinking the best place for something like this might be a repo on the ACCESS Community Hub organisation. It makes it more straightforward to collaborate as we could make you an admin on the repo.
https://github.com/ACCESS-Community-Hub
Then we could point to it from this repo
Does that sound like a good way forward?
Sounds fine to me @aidanheerdegen - thanks. Do you, @rbeucher, @dougiesquire, or any of your software engineering gurus have advice on how to structure this repo so it's portable, flexible, and available to all on NCI?
Dougie is on leave, so he's out of the picture.
To make it available on gadi
I'd say we should add conda
packaging. We could also arrange to publish it to the accessnri anaconda channel, or create another access community channel.
We can deal with that later.
As for repo structure, first decision might be flat layout vs src layout, and then isolate functionality in sub-directories.
Is that the sort of thing you were thinking about @Thomas-Moore-Creative?
Do you have any opinions @marc-white?
I think the main thing to determine is a question of scope. What exactly are you trying to do? Is it just doing some stuff to work out the native chunking of netCDF files, or are you looking to expand this to include more tools later down the track?
Then, once you've worked out the answer to that question, that will inform your answer to the next question: should this come in as a part of access-nri-intake-catalog
, or should it be spun off into its own utility package?
Thanks @marc-white.
What I'm trying to do is get my projects done, which requires using the access-nri-intake-catalog
, and for me that means data discovery, building search filters, and understanding data structure to allow optimal analysis-ready-data workflows to be built for specific datasets.
I highlighted just one type of very simple utility that I'm building ( "find the native chunking information" ) in this issue but I am wondering out loud if there is a better place to be developing helper utilities than my personal repos? Maybe the questions are:
access-nri-intake-catalog
is a real and general need for the community?As for repo structure, first decision might be flat layout vs src layout, and then isolate functionality in sub-directories.
I'd recommend going with src layout for consistency - it seems to be Dougie's preferred layout, and would keep things consistent with this package itself and the related intake-dataframe-catalog.
As to whether this should be included within access-nri-intake-catalog
or as a standalone package, I would suggest the latter. Lots of the functionality of the catalog, eg. loading datasets, is actually performed by intake-esm
, and I suspect that this might cause complications. My vote would be for a separate package - something like access-intake-utils
- and then we try to keep the interdependencies as minimal as possible.
My vote would be for a separate package - something like
access-intake-utils
- and then we try to keep the interdependencies as minimal as possible.
From a users point of view this makes sense to me. Thanks for the advice.
Can we start with an access-intake-utils
repo in https://github.com/ACCESS-Community-Hub, as suggested by @aidanheerdegen above?
I agree with @ charles-turner-1, A separate package is the way to go for now. Feel free to start in ACCESS-Community-Hub
Is your feature request related to a problem? Please describe.
To enable a better understanding of the underlying NetCDF data structure so settings like
xarray_open_kwargs
can be used effectively requires discovery of the native file chunking.Describe the feature you'd like
Describe alternatives you've considered
Writing my own
pre-alpha
functions here: https://github.com/Thomas-Moore-Creative/ACDtools/blob/main/ACDtools likefind_chunking_info
but I'd like a place to collaborate on utilities that was easier for the whole community to see and share.Additional context
from tabulate import tabulate
function to make simple tables display in theJupyter
UI.