Using a Hydropandas ObsCollection as a database

First of all, thank you for sharing such a nice piece of work!

We have been struggling here to get started with using a Hydropandas ObsCollection as a database.

The instruction so far is this. This is a bit cryptical. I'll describe the steps we have taken so far:

In the file app.py of the Datalens package on lines 43-48 you can configure the datasource.

Example with extent: works
Example with zipfile: works. Am I right that this zipfile is a zipped pickle of an ObsCollection?
Example directly from our own ObsCollection: Works after some struggles!

We get it working with the following lines of code being added in the file app.py instead of lines 45-48:

# Read ObsCollection from pickle
import hydropandas as hpd    
oc = hpd.read_pickle("my_obscollection.pkl")

# Add '_0' to the end of the index column (containing location names) if there is no '_' or '-' yet
oc.index = oc.index.map(lambda x: x + '_0' if '_' not in x and '-' not in x else x)

# Specifies datasource
db = HydropandasDataSource(oc=oc, source="bro")

I think it is worth mentioning for others that Datalens seems to be more strict on column names than Hydropandas. See lines 584-589 of source.py for the required names of the measurement ('values') and annotation ('qualifier') column (if you use: source="bro").

The addition of '_0' to the names (index) of the locations is a workaround since Datalens throws an error if it cannot split the name on either a '_' or a '-'.

To end up just a question: It seems to me a bit strange that I have to edit the file app.py (in my package) every time when I open a new project and want to analyse a new ObsCollection. Or did I miss something?

Hi @MattBrst ,

Thanks for the thoughtful feedback! As I think you noticed, using Hydropandas as a data source wasn't entirely the primary focus of this project, though something we wanted to keep in mind while developing the tool.

The instruction so far is this. This is a bit cryptical.

Noted, this could be better. I will add a brief note that currently users are expected to modify this in app.py. I will also note that the tool expects index and column names to meet certain criteria.

Example with zipfile: works. Am I right that this zipfile is a zipped pickle of an ObsCollection?

Correct.

Example directly from our own ObsCollection: Works after some struggles!

Glad to hear you got it working, but it could definitely use some improvements :).

I think it is worth mentioning for others that Datalens seems to be more strict on column names than Hydropandas.

Yes, this is definitely something that should be documented.

The addition of '_0' to the names (index) of the locations is a workaround since Datalens throws an error if it cannot split the name on either a '_' or a '-'.

Yeah, not sure if this is easy to avoid. I also don't think it's necessarily a bad thing to require both a location name and filter number in the index, but that should be clearly documented, and perhaps checked/autofixed on startup.

It seems to me a bit strange that I have to edit the file app.py (in my package) every time when I open a new project and want to analyse a new ObsCollection. Or did I miss something?

Unfortunately, you're not missing something, this is currently the way it has been programmed. I have been working off a cloned repository for development and did an editable install (pip install -e .). That makes it slightly easier to modify the configuration or app.py file without having to do a deep dive into your Python environment.

I haven't yet made a decision on how users should provide information to the application outside of the repository (or installed package), and especially how to support all the different options. Some initial ideas include:

Allow passing a filename to gwdatalens, e.g. gwdatalens --hydropandas-file my_obscollection.pkl
Allow passing a config file to gwdatalens that includes some information about the database: gwdatalens --config-file my_config.toml. This would supersede the defaults in the config file included with gwdatalens. This config file would then contain information about the "database". For hydropandas this would include options to specify an extent or file.
Show how gwdatalens can be launched from a Python script in which a custom ObsCollection is loaded as the database?
...

Anyway, still a work in progress, and I appreciate the feedback. I will probably continue work on this tool in the next year, and I'll take the above into consideration. Or if you feel comfortable submitting changes through a Pull Request, let me know ;).

ArtesiaWater / gwdatalens

Using a Hydropandas ObsCollection as a database #37