GMOD / bam-js

Parse BAM and BAM index files in javascript for node and the browser
MIT License
19 stars 9 forks source link

NPM version Coverage Status Build Status

Install

$ npm install --save @gmod/bam

Usage

const { BamFile } = require('@gmod/bam')
// or import {BamFile} from '@gmod/bam'

const t = new BamFile({
  bamPath: 'test.bam',
})

// note: it's required to first run getHeader before any getRecordsForRange
var header = await t.getHeader()

// this would get same records as samtools view ctgA:1-50000
var records = await t.getRecordsForRange('ctgA', 0, 50000)

The bamPath argument only works on nodejs. In the browser, you should pass bamFilehandle with a generic-filehandle e.g. RemoteFile

const { RemoteFile } = require('generic-filehandle')
const bam = new BamFile({
  bamFilehandle: new RemoteFile('yourfile.bam'), // or a full http url
  baiFilehandle: new RemoteFile('yourfile.bam.bai'), // or a full http url
})

Input are 0-based half-open coordinates (note: not the same as samtools view coordinate inputs!)

Usage with htsget

Since 1.0.41 we support usage of the htsget protocol

Here is a small code snippet for this

const { HtsgetFile } = require('@gmod/bam')

const ti = new HtsgetFile({
  baseUrl: 'http://htsnexus.rnd.dnanex.us/v1/reads',
  trackId: 'BroadHiSeqX_b37/NA12878',
})
await ti.getHeader()
const records = await ti.getRecordsForRange(1, 2000000, 2000001)

Our implementation makes some assumptions about how the protocol is implemented, so let us know if it doesn't work for your use case

Documentation

BAM constructor

The BAM class constructor accepts arguments

Note: filehandles implement the Filehandle interface from https://www.npmjs.com/package/generic-filehandle. This module offers the path and url arguments as convenience methods for supplying the LocalFile and RemoteFile

async getRecordsForRange(refName, start, end, opts)

Note: you must run getHeader before running getRecordsForRange

async *streamRecordsForRange(refName, start, end, opts)

This is a async generator function that takes the same signature as getRecordsForRange but results can be processed using

for await (const chunk of file.streamRecordsForRange(
  refName,
  start,
  end,
  opts,
)) {
}

The getRecordsForRange simply wraps this process by concatenating chunks into an array

async getHeader(opts: {....anything to pass to generic-filehandle opts})

This obtains the header from HtsgetFile or BamFile. Retrieves BAM file and BAI/CSI header if applicable, or API request for refnames from htsget

async indexCov(refName, start, end)

Returns features of the form {start, end, score} containing estimated feature density across 16kb windows in the genome

async lineCount(refName: string)

Returns number of features on refName, uses special pseudo-bin from the BAI/CSI index (e.g. bin 37450 from bai, returning n_mapped from SAM spec pdf) or -1 if refName not exist in sample

async hasRefSeq(refName: string)

Returns whether we have this refName in the sample

Returned features

Example

feature.ref_id // numerical sequence id corresponding to position in the sam header
feature.start // 0-based half open start coordinate
feature.end // 0-based half open end coordinate
feature.name // QNAME
feature.seq // feature sequence
feature.qual // qualities
feature.CIGAR // CIGAR string
feature.tags // tags
feature.flags // flags
feature.template_length // TLEN

Note

The reason that we hide the data behind this ".get" function is that we lazily decode records on demand, which can reduce memory consumption.

License

MIT © Colin Diesh