NASA-IMPACT / veda-jupyterhub

VEDA JupyterHub technical planning and documentation
1 stars 1 forks source link

Allow users to "Open Dataset in QGIS" #2

Closed batpad closed 3 months ago

batpad commented 6 months ago

At a high-level, we should have a button in the VEDA UI on the dataset page that would allow a user to "Open the dataset in QGIS" - this would take the user to a QGIS instance running inside of VEDA hub and open the selected dataset in QGIS.

Existing work on the QGIS image and setting up default data sources is here: https://github.com/2i2c-org/nasa-qgis-image/issues

Related issue about pre-loading QGIS with access to Earth Data datasets is here: https://github.com/2i2c-org/infrastructure/issues/3479

We can break-down tasks and refine our approach here.

cc @yuvipanda @geohacker

batpad commented 6 months ago

Existing issue that discusses the details of what would be involved here a bit more: https://github.com/2i2c-org/infrastructure/issues/2985

batpad commented 4 months ago

From discussions with @yuvipanda, it seems like how we want to implement this:

Similar to the DesktopHandler served at /desktop/, we will create a handler specific to QGIS at /qgis/ and then write code to handle reading query parameters at that URL and opening QGIS with the appropriate options / configuration based on query parameters passed in.

One can see how the current DesktopHandler is setup in jupyter-remote-desktop-proxy:

So, I think the idea here would be to create a jupyter-remote-qgis-proxy that wraps around jupyter-remote-desktop-proxy and creates a new handler for /qgis/.

As a first version, let's accept something like a dataset=... query parameter that can accept a URL for a dataset, and start QGIS with parameters to open the dataset. We can then evaluate if we need more complexity / the ability to pass in more query parameters (for eg. bbox).

Once we have created jupyer-remote-qgis-proxy, we can include it in the nasa-qgis-image and test.

Acceptance Criteria:

@wildintellect may need your help here figuring out what kinds of datasets can be passed in and what flags we can use when starting QGIS to open the dataset passed in correctly.

@sunu let's go over this when we next chat and figure out next steps. @yuvipanda I think I've mostly grokked what we need to do here, but it's possible that it'd be helpful to have a quick chat with you before we kick this work off in earnest.

cc @geohacker

wildintellect commented 4 months ago

@batpad I suspect we'll want to rely on STAC where a "collection" == "dataset" or an "item" == "dataset", a collection would probabably need to have a web service that QGIS could use, where an item could have an asset defined and we could filter based on GDAL/OGR supported formats. I need to go back an look at what the QGIS STAC plugin does because hooking into that might be another approach.

To get started I think we need a user story with a particular dataset, so we can work through the process. @j08lue can you think of an relatively simple high value dataset to try from VEDA?

j08lue commented 4 months ago

can you think of an relatively simple high value dataset to try from VEDA?

We often use the Nitrogen Dioxide for demo purposes: https://radiantearth.github.io/stac-browser/#/external/staging-stac.delta-backend.com/collections/no2-monthly

yuvipanda commented 4 months ago

@batpad thanks for kicking this off! I think it would be useful for me to be in the first meeting as well, please include me in that?

sunu commented 4 months ago

@batpad For a bare bones implementation, can we clarify what a dataset is and what it involves for opening such a dataset with QGIS?

I tried the simplest example of opening a remote geojson file through QGIS cli and it doesn't quite work because QGIS assumes the file is a local file:

(base) jovyan@4a63b62ddbec:~$ qgis "https://raw.githubusercontent.com/datameet/maps/master/Country/india-osm.geojson"
Warning: QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-jovyan'
ERROR: Status 2: File /home/jovyan/https:/raw.githubusercontent.com/datameet/maps/master/Country/india-osm.geojson could not be found

No luck with a viscurl url either:

(base) jovyan@4a63b62ddbec:~$ qgis "/vsicurl/https://raw.githubusercontent.com/datameet/maps/master/Country/india-osm.geojson"
Warning: QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-jovyan'
ERROR: Status 2: File /vsicurl/https:/raw.githubusercontent.com/datameet/maps/master/Country/india-osm.geojson could not be found

Not sure if I'm using the wrong syntax to invoke qgis here. Can someone more familiar with QGIS verify please?

But overall it looks like we need 2 steps to get "open a dataset in QGIS" to work:

  1. Given a remote dataset, have a command that opens QGIS with the remote dataset loaded
  2. Given a url like jupyter.hub/qgis/?dataset=https://example.com/dataset, invoke the command from step 1 with the dataset

Looks like a minimal implementation of step 2 is fairly straight-forward to implement by making a QGIS specific fork of jupyter-remote-desktop-proxy. But we need to discuss a bit more about how to implement step 1. If directly opening QGIS with a remote url doesn't work then another alternative would be to try and generate a project file pointing to the remote dataset and open QGIS with the generated project file preloaded. Or we can try to automate the steps of loading the data into a new layer with PyQGIS.

batpad commented 4 months ago

@sunu great work here!

@geohacker @wildintellect do you know if there's a good way to invoke QGIS with a remote URL as parameter so that it opens that dataset when it starts up? (I know the definition of "dataset" here can get really complex, but for a proof of concept, let's start with something simple like a GeoJSON)

wildintellect commented 4 months ago

Seems like a bug in the command line implementation. Using that file in the Vector loader works fine: Screenshot from 2024-05-28 15-14-50

I'll need to search for alternatives, one I can think of is to inject the layer into a template QGS/QGZ project file, and open the project instead.

wildintellect commented 4 months ago

india.qgz.zip I added .zip to the end so github would take it. QGZ is a zip file... QGS inside is an xml file.

sunu commented 4 months ago

I have a working prototype that combines a minimal project template and a pyqgis script to automate loading remote vector data files. Here's a quick demo:

https://github.com/NASA-IMPACT/veda-jupyterhub/assets/1142203/6c9ed91e-026e-4767-9e0a-c3a8dda806ab

I'll put the code in a repo once I clean things up a bit. cc @batpad

j08lue commented 4 months ago

Woo hoo, great to see this in action, @sunu!

mfisher87 commented 3 months ago

:star_struck: That's so cool, amazing work!

wildintellect commented 3 months ago

@sunu can you link the code? Also we should file an upstream bug/enhancement with QGIS about supporting url based data sources in the ci. https://github.com/qgis/QGIS/issues might be worth jumping on a chat with QGIS devs to figure out the best way to propose.

sunu commented 3 months ago

I cleaned up the code a bit and upload it to https://github.com/sunu/jupyter-remote-qgis-proxy/

This is a Jupyter server extension that inherits from https://github.com/jupyterhub/jupyter-remote-desktop-proxy instead of forking it. Hoping this will be easier to maintain than a fork.

@wildintellect The relevant part of the code for opening QGIS is here: https://github.com/sunu/jupyter-remote-qgis-proxy/blob/772c016b413a0faae64110d7a147bd0cfadb2a3f/jupyter_remote_qgis_proxy/qgis/utils.py#L5

And to test it out, I have a branch on my fork of nasa-qgis-image that uses this server extension. To run this locally, you can clone the repo and run the following commands:

git clone git@github.com:sunu/nasa-qgis-image.git
cd nasa-qgis-image
git checkout qgis-proxy

docker build -t qgis .
docker run -it -p 8888:8888 --security-opt seccomp=unconfined qgis
batpad commented 3 months ago

We're deploying this to the hub: https://github.com/2i2c-org/infrastructure/pull/4299

Once ^ this is deployed, you should be able to test with a link like this:

https://hub.openveda.cloud/user-redirect/qgis?action=add_vector_layer&url=https://raw.githubusercontent.com/flatgeobuf/flatgeobuf/master/test/data/countries.fgb&layer_name=coutries&project_name=fgb-countries

You should be prompted to login. You MUST select the QGIS image in the profile selection screen, and then hit Start Server. (we will be working to improve the automatic profile selection from URL parameters in the next quarter)

Once your QGIS container starts, it should automatically load up the dataset specified in the URL, i.e. https://raw.githubusercontent.com/flatgeobuf/flatgeobuf/master/test/data/countries.fgb . This should work for common vector formats hosted on public URLs.

(Once the QGIS container is already running, subsequently opening a new dataset with a link similar to the above link will open it in the same container and not spin up a new container).

It should be reasonably trivial to add other data formats / anything that QGIS supports once we confirm this works well.

batpad commented 3 months ago

Video demo:

https://github.com/NASA-IMPACT/veda-jupyterhub/assets/72280/2bf37fdd-1516-46f0-85e7-1e6fa4654614

batpad commented 3 months ago

We have this deployed on the VEDA hub! Am going to close this issue and we'll open separate issues for:

Thanks much @sunu for your amazing work on this and @yuvipanda for all the guidance and support!