IQSS / dataverse-pm

Project management issue tracker for the Dataverse Project. Note: Related links and documents may not be public.
https://dataverse.org
0 stars 0 forks source link

Project: GEOS-Chem Proof-of-Concept #154

Closed cmbz closed 9 months ago

cmbz commented 11 months ago

Overview

Two-phase project to investigate and pilot large data and computation support for GEOS-Chem datasets using a containerized Dataverse installation running on Mass Open Cloud resources.

The proof-of-concept will be demoed at the Mass Open Cloud Alliance Conference (2024/02/28)

Participants

Timeline

Tasks

January, 2024 and February, 2024

March, 2024

Related

Resources

cmbz commented 10 months ago

2024/01/26

landreev commented 9 months ago

I'll be using this issue to track the effort of porting the GEOS-Chem supplied notebook into something that can be deployed on MERC and used in the context of the MOC-PoC presentation.

cmbz commented 9 months ago

Thanks @landreev But please give me a heads' up before you close the issue.

landreev commented 9 months ago

Wasn't planning to close it, no. @pdurbin and I briefly discussed opening a separate local issue to track the dev. effort needed to port the notebook into the framework of the demo. We decided to use this one instead, since it was already there. But I can still open a new one if you prefer.

landreev commented 9 months ago

(I just wanted to have something "in progress" to reflect this task, since this is the focus of the MOC-POC effort at this point).

cmbz commented 9 months ago

@landreev totally fine to keep on with this issue!

pdurbin commented 9 months ago

If it helps, I now have OpenShift running locally on my laptop because I was looking at this PR:

I just started a thread in Slack if it would be helpful or interesting for me to try to run the notebook on my local version of OpenStack, but I'm not sure where to begin.

pdurbin commented 9 months ago

I'm not sure if this helps or not @r1beguin recently demoed launching JupyterLab from Dataverse. Here are some screenshots from his July 2023 community call presentation:

Screenshot 2024-02-13 at 9 46 28 AM Screenshot 2024-02-13 at 9 46 35 AM Screenshot 2024-02-13 at 9 46 43 AM

The code is here: https://forgemia.inra.fr/dipso/eosc-pillar/dataverse-jupyterhub-connector

I just merged a PR where there's a nice writeup of the tool on our "integrations" page:

Also, from their README, here's a diagram of how it works:

Screenshot 2024-02-13 at 9 52 40 AM
landreev commented 9 months ago

JupyterHUB, not "Lab", right?

pdurbin commented 9 months ago

@landreev whoops, yes, hub not lab.

landreev commented 9 months ago

Can this setup be used in our case, for the purposes of the demo? - I don't fully understand this part.

I passed the ssh key to a NERC VM to Bob Yantosca, the author of the notebook, yesterday and asked him to install it, replicating the environment under which he developed it. The data files are already saved on the instance locally. I'm waiting to hear from him. Once it's running like that, we'll at least be able to see what it's looking like, and then we can add extras to it - the storage calls, the passing of parameters and figuring out how it can be deployed in a container. So this is the extent of my current plan.

landreev commented 9 months ago

I have a very crude/fake/hard-coded/everything glued together with dog drool kind of a demo that nevertheless ties the pieces together - the dataset with the GEOS-Chem datafiles in it and the "external tool" that sends the user to the statistics notebook, that in turn generates pretty graph images. I will post links/images in the slack channel as a quick status update, and will continue working making the whole thing less fake/hard-coded.

landreev commented 9 months ago

I marked the remaining demo-related items on the checklist as completed and I'm removing my name from the issue (@cmbz you asked me not to close it - so, leaving it as is). This is under the assumption that this completed "for the purposes of the demo presentation", as a quick proof of concept only. I will open a new issue in the main repo for working out a real infrastructure setup that will allow users to run arbitrary, non-hard coded computation code on a cluster. That is the next logical step, and it makes sense to work on this while we have access to the NERC cluster facilities.

pdurbin commented 9 months ago

I'm not actively working on this so I removed my name as well.

cmbz commented 9 months ago

Closing issue as complete. Follow up work to create Harvard Dataverse Repository GEOS-Chem collections will continue here: https://github.com/IQSS/dataverse-pm/issues/178