Create test notebooks to test expected functionality after image updates

batpad commented 7 months ago

Relates to:

We need a few "test notebooks" to use as part of CI tests when we do image updates. The tests would test that that when these notebooks are run in the updated environment, the outputs remain the same, and provide some interface to update the expected snapshot if the changes in output are expected.

These notebooks would ideally be simple enough that they are intentional about specific things being tested, and yet cover a broad amount of use-cases to catch potential regressions in upstream libraries. A good example of the kind of thing it should catch is https://github.com/NASA-IMPACT/veda-jupyterhub/issues/14 . It would be great if we can make the notebooks self contained (i.e. not dependent on external data sources), and deterministic in their outputs, avoiding outputs based on things that would then need to be mocked, like datetime.now() .

If we have a few of these notebooks, we can easily run them as part of CI.

@wildintellect @jsignell @abarciauskas-bgse - would any of you be able to help with the creation of these test notebooks? Happy to have a chat about specifics - it can likely be some slightly simplified form of the veda-docs notebooks or so. I could also give it a stab just looking at the docs notebooks, but likely don't have as good a sense of the breadth of things that need testing as you all do.

jsignell commented 6 months ago

For the last new image I just ran through these quickstart notebooks: https://github.com/NASA-IMPACT/veda-docs/tree/main/notebooks/quickstarts. But they do require AWS creds. Is it feasible to set up CI to have credentials? Or should we be trying to create notebooks that only rely on public data?

jsignell commented 6 months ago

Also want to make sure that this ticket and https://github.com/NASA-IMPACT/veda-backend/issues/357 are aware of each other since to goals are potentially overlapping

batpad commented 6 months ago

Or should we be trying to create notebooks that only rely on public data?

@jsignell I would have a pretty strong preference for relying on public data and not requiring the tests to run in an environment with privileged access - I'd go even further and "if possible", to include very truncated data to allow us to test the bits we need just locally / in the repo, so there aren't externalities like network requests which could cause tests to fail. (Although, if the data is fetched from public s3 buckets or locations we assume to be pretty stable, that's totally fine).

@jsignell does that seem okay? If you think we do need privileged access within the tests, I can check how we might make that possible, but does that seem like it would be a lot more complicated than being able to run the tests stand-alone without privileged access.

Thanks again for looking into this!

jsignell commented 6 months ago

Yes I totally agree that it would be preferable to only rely on public data and APIs. My understanding is that VEDA doesn't have any public buckets right now and the expectation is that external usage will be via one of the APIs. So maybe the easiest first step would be to simplify some of the API notebooks while we think over how to have public data.

I do think it is important to exercise reading data from s3 with fsspec, but maybe we start with partial tests and go from there.

batpad commented 6 months ago

I do think it is important to exercise reading data from s3 with fsspec, but maybe we start with partial tests and go from there.

Yup - starting with simple / partial tests and growing out the complexity as we go sounds like a good approach to me. Sorry for the late response here @jsignell - @sunu is just working on moving the singleuser image into its own repository and setting up the notebook tests to run via CI, etc. Right now, we'll just add a very simple "hello world" type notebook to test that the tests work. It'll probably be easier for you to work through this once that's setup so that you can "test your tests" as you go. We're hoping to have that setup by early next week - once that's done, will ping you, and we can sync if needed if there's questions or things we need to discuss.

Thanks again!

batpad commented 5 months ago

Apologies for the delay here - we now have the repository up that aims to simplify the Dockerfile and build process for our custom pangeo image, as well as allows us to run tests as part of CI.

https://github.com/NASA-IMPACT/pangeo-notebook-veda-image . We will work on over-all documentation in the next days and migrate the pangeo images used on hubs. In that repository, you will see an image-tests/ folder. You can put executed .ipnyb files there, and they will be run and checked against the expected outputs on every commit. Currently, there is just one extremely basic test notebook to "test that the tests work".

We are using repo2docker to build and test the image. You can read more about how the tests work here: https://github.com/jupyterhub/repo2docker-action?tab=readme-ov-file#testing-the-built-image

@jsignell @wildintellect - would you be able to work on the test notebooks? Can just make PRs to https://github.com/NASA-IMPACT/pangeo-notebook-veda-image

Let me know if it helps to chat or if there are any questions or concerns.

Thanks @sunu for putting this together!

jsignell commented 5 months ago

Oh neat! Yeah I can take a crack at adding some test notebooks. Should I just run them on the pangeo env on vedahub?

batpad commented 5 months ago

Should I just run them on the pangeo env on vedahub?

I think this should be fine (but being aware that the environment the CI tests will run in won't have access to things like private s3 buckets).

You should also be able to run the image built in https://github.com/NASA-IMPACT/pangeo-notebook-veda-image/ (which is based on the pangeo base image) locally. We'll add some documentation on how to do that. Filed a ticket for it here: https://github.com/NASA-IMPACT/pangeo-notebook-veda-image/issues/3

Thanks so much @jsignell ! Please let us know however we can help.

jsignell commented 5 months ago

Ok! I just put up a PR with some silly little tests. I tried to include a lot of rendered outputs, but just let me know if you think the notebook is too big. I can certainly trim things down and do fewer viz tests. I was just kind of intrigued by the potential of testing that the rendered output works properly.

batpad commented 5 months ago

@jsignell this looks really great to me! I don't see any issues with the notebook size, and testing visualization outputs sounds good to me! Unless @sunu has any concerns.

I left a small comment on the PR about the STAC query that I'm a bit worried that could break our tests if the STAC catalog we are querying has any updates.

This is exciting! Once we have this merged, I'd really like to do an update to the latest pangeo base notebook version with the new repository + testing workflow so we know the whole process works.

jsignell commented 5 months ago

Ok great! I just made the recommended update on the PR so it is all good from my perspective.

NASA-IMPACT / veda-jupyterhub

Create test notebooks to test expected functionality after image updates #27