cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
27 stars 1 forks source link

Set up CI #53

Closed sampsyo closed 10 months ago

sampsyo commented 1 year ago

It would be really great to someday set up CI using GitHub Actions. Ideally, this would let us run our entire test suite on every PR to make sure we don't break anything.

The one tricky thing about this will be getting odgi set up. We could use our existing Dockerfile for that—perhaps more reasonably—we could just run odgi from the official Docker image. That is, it may be as simple as using docker run pangenome/odgi in place of odgi, or whatever the Actions-endorsed equivalent is for bringing in a tool from a published Docker image.

anshumanmohan commented 1 year ago

Love it; will explore!

anshumanmohan commented 1 year ago

I've taken a tiny step towards this by adding a formatting-checker for all our Python code. #87, which presently open, passes this. A more serious CI is coming up!

anshumanmohan commented 1 year ago

So I think odgi is indeed going to be the big hurdle, and I think using their Docker is not quite going to work. I'd like to show you my thrashing just to make sure I'm correct in dismissing this approach. Please let me know if I missed something!!

$ # The following won't work
$ docker pull pangenome/odgi
Using default tag: latest
Error response from daemon: manifest for pangenome/odgi:latest not found: manifest unknown: manifest unknown

$ # We actually need to get it from quay, as described here:
$ # https://odgi.readthedocs.io/en/latest/rst/installation.html#docker
$ docker pull quay.io/biocontainers/odgi:0.8.2--py38h3b68952_0
0.8.2--py38h3b68952_0: Pulling from biocontainers/odgi
Digest: sha256:1f77a96eb368331805a01856c0c9432f872033e869b9e2d19de2aa17bc12f760
Status: Image is up to date for quay.io/biocontainers/odgi:0.8.2--py38h3b68952_0
quay.io/biocontainers/odgi:0.8.2--py38h3b68952_0

$ # Now the following should work, but doesn't
$ docker run quay.io/biocontainers/odgi
Unable to find image 'quay.io/biocontainers/odgi:latest' locally
docker: Error response from daemon: manifest for quay.io/biocontainers/odgi:latest not found: manifest unknown: manifest unknown.
See 'docker run --help'.

$ # This works; it needs the tag
$ docker run quay.io/biocontainers/odgi:0.8.2--py38h3b68952_0 odgi
odgi: optimized dynamic genome/graph implementation, version v0.8.2-0-g8715c55

usage: odgi <command> [options]

Overview of available commands:
  -- bin           Binning of pangenome sequence and path information in the graph.
  -- break         Break cycles in the graph and drop its paths.
  ...

$ # But actually passing it a graph is another story. I have test/k.gfa locally.
$ docker run quay.io/biocontainers/odgi:0.8.2--py38h3b68952_0 odgi depth -i test/k.gfa
[odgi::depth] error: the given file "test/k.gfa" does not exist. Please specify an existing input file in ODGI format via -i=[FILE], --idx=[FILE].
$ # I think this is because the container doesn't have access to the local filesystem.

# To make this work, we need to mount the local directory into the Docker container...
# ...and at that point I'm not sure why we're using the odgi Docker at all.
# We could just get GitHub Actions to install it and run it directly.
anshumanmohan commented 1 year ago

When we did up the Pollen Dockerfile, we figured out a log of the gunk re: installing ODGI and getting it added to the paths etc such that (1) odgi on the command line and (2) import odgi in a Python shell both work.

I am now thinking of just getting Github Actions to, at every PR:

  1. build the Pollen image using Dockerfile; this includes the latest commit to odgi's main branch
  2. run our tests in that Docker container
  3. If successful, publish our Docker container to Dockerhub.

Thoughts? :)

anshumanmohan commented 1 year ago

Update after meeting: this is probably possible, but too slow and annoying.

Try the mounting-of-volumes thing (which I whined about here) instead!

sampsyo commented 1 year ago

Indeed! Here's something that works with volumes, FWIW:

docker run --rm -v `pwd`/test:/test quay.io/biocontainers/odgi:0.8.2--py38h3b68952_0 odgi depth -i /test/basic/ex1.gfa

Annoying that there is not a latest tag on the Docker registry so we have to use stuff like 0.8.2--py38h3b68952_0. It's also kinda worrisome that the latest build there seems to be from 5 months ago. But maybe that's OK?

sampsyo commented 1 year ago

The other option is to run all our stuff inside a Docker container based on the odgi one. GitHub Actions can apparently start within a container, so we could plausibly make the whole test job run with container: quay.io/biocontainers/odgi:0.8.2--py38h3b68952_0. Could be silly in its own way, but it could also work?

anshumanmohan commented 1 year ago

Sorry to have gotten behind on this, but I am looking at this with fresh eyes and here's my resolution for what to try next:

There are risks inherent to using static versions of Calyx and odgi, and the odgi risk is bigger still because we can't just do up a new release ourselves, but this setup will basically get us going.

sampsyo commented 1 year ago

Yes, sounds great. My hope is that odgi changes pretty slowly so it won't matter, and our use of Calyx will evolve only when we're aware that it does, so it won't be too hard to bump those dependencies manually.