$ npm install --save @gmod/bam
const { BamFile } = require('@gmod/bam')
// or import {BamFile} from '@gmod/bam'
const t = new BamFile({
bamPath: 'test.bam',
})
// note: it's required to first run getHeader before any getRecordsForRange
var header = await t.getHeader()
// this would get same records as samtools view ctgA:1-50000
var records = await t.getRecordsForRange('ctgA', 0, 50000)
The bamPath
argument only works on nodejs. In the browser, you should pass
bamFilehandle
with a generic-filehandle e.g. RemoteFile
const { RemoteFile } = require('generic-filehandle')
const bam = new BamFile({
bamFilehandle: new RemoteFile('yourfile.bam'), // or a full http url
baiFilehandle: new RemoteFile('yourfile.bam.bai'), // or a full http url
})
Input are 0-based half-open coordinates (note: not the same as samtools view coordinate inputs!)
Since 1.0.41 we support usage of the htsget protocol
Here is a small code snippet for this
const { HtsgetFile } = require('@gmod/bam')
const ti = new HtsgetFile({
baseUrl: 'http://htsnexus.rnd.dnanex.us/v1/reads',
trackId: 'BroadHiSeqX_b37/NA12878',
})
await ti.getHeader()
const records = await ti.getRecordsForRange(1, 2000000, 2000001)
Our implementation makes some assumptions about how the protocol is implemented, so let us know if it doesn't work for your use case
The BAM class constructor accepts arguments
bamPath
/bamUrl
/bamFilehandle
- a string file path to a local file or a
class object with a read methodcsiPath
/csiUrl
/csiFilehandle
- a CSI index for the BAM file, required
for long chromosomes greater than 2^29 in lengthbaiPath
/baiUrl
/baiFilehandle
- a BAI index for the BAM filecacheSize
- limit on number of chunks to cache. default: 50yieldThreadTime
- the interval at which the code yields to the main thread
when it is parsing a lot of data. default: 100ms. Set to 0 to performed no
yieldingNote: filehandles implement the Filehandle interface from https://www.npmjs.com/package/generic-filehandle. This module offers the path and url arguments as convenience methods for supplying the LocalFile and RemoteFile
Note: you must run getHeader before running getRecordsForRange
refName
- a string for the chrom to fetch fromstart
- a 0-based half open start coordinateend
- a 0-based half open end coordinateopts.signal
- an AbortSignal to indicate stop processingopts.viewAsPairs
- re-dispatches requests to find mate pairs. default: falseopts.pairAcrossChr
- control the viewAsPairs option behavior to pair across
chromosomes. default: falseopts.maxInsertSize
- control the viewAsPairs option behavior to limit
distance within a chromosome to fetch. default: 200kbThis is a async generator function that takes the same signature as
getRecordsForRange
but results can be processed using
for await (const chunk of file.streamRecordsForRange(
refName,
start,
end,
opts,
)) {
}
The getRecordsForRange
simply wraps this process by concatenating chunks into
an array
This obtains the header from HtsgetFile
or BamFile
. Retrieves BAM file and
BAI/CSI header if applicable, or API request for refnames from htsget
refName
- a string for the chrom to fetch fromstart
- a 0-based half open start coordinate (optional)end
- a 0-based half open end coordinate (optional)Returns features of the form {start, end, score} containing estimated feature density across 16kb windows in the genome
refName
- a string for the chrom to fetch fromReturns number of features on refName, uses special pseudo-bin from the BAI/CSI index (e.g. bin 37450 from bai, returning n_mapped from SAM spec pdf) or -1 if refName not exist in sample
refName
- a string for the chrom to checkReturns whether we have this refName in the sample
Example
feature.ref_id // numerical sequence id corresponding to position in the sam header
feature.start // 0-based half open start coordinate
feature.end // 0-based half open end coordinate
feature.name // QNAME
feature.seq // feature sequence
feature.qual // qualities
feature.CIGAR // CIGAR string
feature.tags // tags
feature.flags // flags
feature.template_length // TLEN
The reason that we hide the data behind this ".get" function is that we lazily decode records on demand, which can reduce memory consumption.
MIT © Colin Diesh