cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
24 stars 1 forks source link

Update installation instructions #103

Closed anshumanmohan closed 11 months ago

anshumanmohan commented 1 year ago

Because of changes in the Calyx docs, our own installation instructions are becoming harder and harder to follow. For instance, we point users to the "first and third" steps of the Calyx installation instructions, but the first step now itself has three options (docker, crate, source).

I'm not quite sure what a permanent fix would be, but let's at least patch it some. This can be fixed in the same spirit, if not the same PR, as the fix for #98.

First: Crate good? @susan-garry, do you know if the crate version of Calyx is good enough for our purposes? That would give us a certain barrier of stability I suppose, since we'd be asking people to go through the crate, not having them pull the bleeding-edge main branch every time. I don't know if that's actually a fair assessment; maybe the Calyx crate is deployed automatically and often.

Second: This much madness is too much sorrow.

  1. I wonder if we should just inline our own installation instructions for Calyx. I don't love that we currently have users jumping to the Calyx docs, running/skipping stuff there, coming back to us, running more fud config stuff, and then getting going. If we inline Calyx installation instructions that we know will work for Pollen's purposes, that will introduce a maintenance task.
  2. Regardless of what we do with the inlined/foreign instructions, I'd love to finishw with a fud check invocation that passes 100%, with no warnings for the purposes of Pollen. We have recently learned the format for doing that.

Third: Docker all the way? Perhaps somewhat at odds with all of the above, but this is all making me wonder if we shouldn't just make Docker the default way we do business. Our Dockerfile currently starts with Calyx's Docker image, installs odgi from source, and installs Pollen.

  1. We could catch it up to install mygfa, slow-odgi, and pollen-data-gen, and then legitimately do all our development in Docker.
  2. If we wanted, we could use Adrian's Depot trick to keep the Docker current. Perhaps overkill for a small project like ours.
  3. When it comes to coding day to day, VS Code has support for "attaching" to a Docker container, which is the same as SSH-ing into a remote machine like Havarti.
  4. Sorry if I'm forgetting something that we've already resolved, but does this not work for some reason? Is a Docker too underpowered? Does the Calyx Docker lack an important simulation tool that Havarti has? Paging @rachitnigam on this one.
  5. Suppose the above is impossible. Is it possible that Docker fails for development but satisfies a large chunk of users? In that case we can have grungy installation instructions internally but the official instructions can be two lines: docker pull, docker run.
sampsyo commented 1 year ago

@susan-garry, do you know if the crate version of Calyx is good enough for our purposes? That would give us a certain barrier of stability I suppose, since we'd be asking people to go through the crate, not having them pull the bleeding-edge main branch every time. I don't know if that's actually a fair assessment; maybe the Calyc crate is deployed automatically and often.

Good question. I bet the crate version is good enough, since we published a version fairly recently. It might be OK to stick with that, but the thing is that surely things will break with future crate releases, so there may be no single decision that will stay correct for all time. So maybe we just need to revisit this periodically.

Perhaps somewhat at odds with all of the above, but this is all making me wonder if we shouldn't just make Docker the default way we do business. Our Dockerfile currently starts with Calyx's Docker image, installs odgi from source, and installs Pollen.

This isn't a bad idea for letting folks get started! It could definitely work. I would suggest that we maybe still want to keep around the non-Docker instructions too, but it would alleviate the need for them to be extremely accessible if we have an easy path based on Docker.

If we wanted, we could use Adrian's Depot trick to keep the Docker current. Perhaps overkill for a small project like ours.

Fortunately, Depot tuns out to be very easy to use, so this wouldn't be too bad!!

When it comes to coding day to day, VS Code has support for "attaching" to a Docker container, which is the same as SSH-ing into a remote machine like Havarti.

Yes—there is a specific thing called "dev containers" for setting up an environment for hacking, invented by the VSCode people. The Dockerfile for this may be different from the Dockerfile for distributing a complete, built tool—because the audience is different (hackers vs. users). https://containers.dev

Sorry if I'm forgetting something that we've already resolved, but does this not work for some reason? Is a Docker too underpowered? Does the Calyx Docker lack an important simulation tool that Havarti has?

The missing piece is the Xilinx toolchain, which is prohibitively hard to Dockerize. But that is fine; when we need them we can do non-Docker stuff.

Suppose the above is impossible. Is it possible that Docker fails for development but satisfies a large chunk of users? In that case we can have grungy installation instructions internally but the official instructions can be two lines: docker pull, docker run.

Just for fun: you don't actually need the docker pull step. :smiley: docker run pulls stuff automatically if you don't have it.

anshumanmohan commented 1 year ago

Super, thanks Adrian! If all we're losing is the Xilinx toolchain, I for one will totally be able to work in the Docker. Two actionables coming out of this:

  1. Check on the Docker instructions, update them to install the new Python development, and set up a VSCode dev container for daily use. I am excited for this. I think it will help me kill off #53 and #74 too.
  2. Rework the built-from-scratch instructions so that they work cleanly. Unless there are objections, I'd like to inline Calyx-installation instructions much like we have inlined odgi-installation instructions. Re: which flavor of Calyx, I would like to use the crate version.
rachitnigam commented 1 year ago

I think this approach is good: pin to a particular version of Calyx and as you merge changes into the Calyx repo, publish a new docker image and rebuild Pollen on top of it. I know a lot of people who swear by the "docker-only development model" where they don't even bother installing repo dependencies and just use docker. Overall, I think that's a pretty good dev model anyways so moving towards that doesn't seem bad!

susan-garry commented 11 months ago

I like the idea of using Docker and not having to worry about updating our installation instructions immediately when calyx or odgi publishes a big update! I am a bit concerned that we can't port Xilinx into docker, since it seems like we will want these tools eventually to execute programs on actual FPGAs, but I guess we can ssh out of Docker to use them and can update our install instructions to explain this to users once we get to that point?

sampsyo commented 11 months ago

To summarize, I believe the plan is: