bebop / poly

A Go package for engineering organisms.
https://pkg.go.dev/github.com/bebop/poly
MIT License
671 stars 71 forks source link

Biowasm minimap2/samtools integration #349

Closed Koeng101 closed 10 months ago

Koeng101 commented 1 year ago

Minimap2 is probably the best alignment algorithm when aligning nanopore sequencing data, and samtools lets you work with those alignments to produce useful output information. In fact, 3 of the parsers I built with poly are just to work with the input and output of these pieces of software.

(nanopore sequencer) -> slow5 -> (basecaller) -> fastq -> (minimap2) -> sam -> (samtools) -> pileup

To review:

As such, minimap2 and samtools are essential to my sequence analysis pipeline. They're both C projects - so we could use CGo, but CGo is also kinda the worst. As an alternative, we could use wasm compiled samtools and minimap2. This has already worked for other projects integrating C code, using wazero, a zero dependency WebAssembly runtime in pure Golang.

A wonderful project, biowasm, by @robertaboukhalil has already compiled and tested both minimap2 and samtools in webassembly. We would need some software similar to biowasm's aioli, and then we could integrate these two pieces of software with Poly for a Golang-native experience.

Koeng101 commented 1 year ago
package main

import (
    "context"
    _ "embed"
    "fmt"
    "log"

    "github.com/tetratelabs/wazero"
    "github.com/tetratelabs/wazero/imports/wasi_snapshot_preview1"
)

//go:embed add.wasm
var addWasm []byte

func main() {
    ctx := context.Background()
    r := wazero.NewRuntime(ctx)
    defer r.Close(ctx)
    wasi_snapshot_preview1.MustInstantiate(ctx, r)

    mod, err := r.InstantiateWithConfig(ctx, addWasm, wazero.NewModuleConfig().WithName("addWasm"))
    if err != nil {
        log.Fatalf("%s", err)
    }
    res, err := mod.ExportedFunction("add").Call(ctx, 1, 2)
    if err != nil {
        log.Fatalf("%s", err)
    }
    fmt.Println(res)
}

So you can do something like this with wazero. The NewModuleConfig would be where we add in the io.Reader for stdin and io.Writer for stdout. (using WithStderr, WithStdout, WithStdin)

Ideally, for minimap2, you would instantiate the function with a fake fs.FS (WithFS) with the reference fasta, with space for the index (reference.fasta -> reference.fai). Then you would use the io.WriterTo function from fastq + io.Pipe to pipe the data in from the writer to the stdin (an io.Reader) of the webassembly function. From there, you would take the stdout and io.Pipe that io.Writer to a sam io.Reader. All in all, you could then have a function that takes in a Parser[fastq] + a fasta.Record and get you out a Parser[sam].

github-actions[bot] commented 10 months ago

This issue has had no activity in the past 2 months. Marking as stale.

TimothyStiles commented 10 months ago

Closing as stale. Feel welcome to reopen but this may be better as an external project to start.