Closed nathandunn closed 6 years ago
@nathandunn I think you are referring to htslib API from Broad: https://samtools.github.io/htsjdk/
htseq is an entirely different tool written in Python for estimating read counts from RNAseq BAMs ;)
Yes that is right. I think I ended up using the genomics Java lib from unc for bigwig and the same for bam.
Zoomed out histogram view will happen automatically once I populate those methods.
NO Track assemblage null and {"organism":"21","clientToken":"11410287301476095970","sequenceList":[{"organism":"Honeybee","name":"GroupUn1076","start":0,"length":970,"end":970,"location":[{"fmin":0,"fmax":970}],"id":449,"reverse":false}]}
params [controller:bam, organismId:21, action:global, trackName:Forager RNA-Seq reads]
| Error 2017-07-25 16:28:08,278 [http-bio-8080-exec-13] ERROR errors.GrailsExceptionResolver - SAMFormatException occurred when processing request: [GET] /apollo/bam/Forager%20RNA-Seq%20reads/21/stats/global
Error parsing SAM header. @HD line has non-conforming SO tag value: sorted.. Line:
| Error 2017-07-25 16:28:08,278 [http-bio-8080-exec-13] ERROR errors.GrailsExceptionResolver - SAMFormatException occurred when processing request: [GET] /apollo/bam/Forager%20RNA-Seq%20reads/21/stats/global
Error parsing SAM header. @HD line has non-conforming SO tag value: sorted.. Line:
@HD VN:1.0 SO:sorted; File /opt/apollo/honeybee/data/bam/forager_Amel4.5_accepted_hits.bam; Line number 1. Stacktrace follows:
Message: Error parsing SAM header. @HD line has non-conforming SO tag value: sorted.. Line:
@HD VN:1.0 SO:sorted; File /opt/apollo/honeybee/data/bam/forager_Amel4.5_accepted_hits.bam; Line number 1
Line | Method
->> 265 | reportErrorParsingLine in htsjdk.samtools.SAMTextHeaderCodec
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| 236 | parseHDLine in ''
| 102 | decode . . . . . . . . in ''
| 667 | readHeader in htsjdk.samtools.BAMFileReader
| 298 | <init> . . . . . . . . in ''
| 176 | <init> in ''
| 376 | open . . . . . . . . . in htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl
| 202 | open in ''
| 107 | global . . . . . . . . in org.bbop.apollo.BamController
| 198 | doFilter in grails.plugin.cache.web.filter.PageFragmentCachingFilter
| 63 | doFilter . . . . . . . in grails.plugin.cache.web.filter.AbstractFilter
| 449 | executeChain in org.apache.shiro.web.servlet.AbstractShiroFilter
JBrowse / volvox is using Alignment2, which we don't want to do. However, they are essentially the same.
The properties come in between Alignments.js and _AlignmentsMixin.js and they can be hidden in the configuration file.
Header properties:
[comments:[], sortOrder:unsorted, attributes:[VN=1.5], standardTags:[VN, GO, SO], programRecords:[], class:class htsjdk.samtools.SAMFileHeader, validationErrors:[], SAMString:@HD VN:1.5
@SQ SN:ctgA LN:50001
, version:1.5, groupOrder:none, creator:null, textHeader:@SQ SN:ctgA LN:50001
, sequenceDictionary:SAMSequenceDictionary:( sequences:1 length:50001 md5:3d62af8e60a6f9b772a2bcbcd2b5af92), readGroups:[]]
SAM Read properties:
[alignmentBlocks:[htsjdk.samtools.AlignmentBlock@f780326], duplicateReadFlag:false, mateAlignmentStart:0, start:19464, readNameLength:33, readGroup:null, readPairedFlag:false, mappingQuality:37, unclippedStart:19464, baseQualityString:2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222, readUnmappedFlag:false, binaryAttributes:htsjdk.samtools.SAMBinaryTagAndValue@5daa9, variableBinaryRepresentation:[99, 116, 103, 65, 95, 49, 57, 48, 52, 51, 95, 49, 57, 53, 54, 51, 95, 49, 58, 48, 58, 48, 95, 49, 58, 48, 58, 48, 95, 49, 56, 97, 48, 0, 64, 6, 0, 0, -127, 40, 40, 18, 33, 18, 20, -120, 17, 68, 20, 68, 33, 18, 17, 65, -126, 65, -124, 68, 66, 24, 36, -127, -127, 18, -120, 65, 24, 66, 33, -120, 68, 18, 17, 24, 17, -127, 24, -124, 40, 34, 18, 72, 34, 33, 17, 34, -126, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 88, 84, 65, 85, 78, 77, 67, 1, 88, 48, 67, 1, 88, 49, 67, 0, 88, 77, 67, 1, 88, 79, 67, 0, 88, 71, 67, 0, 77, 68, 90, 51, 49, 84, 54, 56, 0], readString:TACTCTACCAACAGTTAAGGAGGGCAACAAGATCGATGGGGCATCGTATAACTTGAATGCCATTGGACAAATAATAATTGCTCCACGTCCCAAACCTCAA, secondaryOrSupplementary:false, pairedReadName:ctgA_19043_19563_1:0:0_1:0:0_18a0, class:class htsjdk.samtools.BAMRecord, header:SAMFileHeader{VN=1.5}, valid:null, notPrimaryAlignmentFlag:false, inferredInsertSize:0, baseQualities:[17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17], readName:ctgA_19043_19563_1:0:0_1:0:0_18a0, readLength:100, referenceName:ctgA, readFailsVendorQualityCheckFlag:false, end:19563, contig:ctgA, cigarString:100M, attributes:[htsjdk.samtools.SAMRecord$SAMTagAndValue@6448454, htsjdk.samtools.SAMRecord$SAMTagAndValue@272215e6, htsjdk.samtools.SAMRecord$SAMTagAndValue@41b839b8, htsjdk.samtools.SAMRecord$SAMTagAndValue@e16739a, htsjdk.samtools.SAMRecord$SAMTagAndValue@2cd23028, htsjdk.samtools.SAMRecord$SAMTagAndValue@1b688d71, htsjdk.samtools.SAMRecord$SAMTagAndValue@1b1d125d, htsjdk.samtools.SAMRecord$SAMTagAndValue@262a2807], unclippedEnd:19563, readBases:[84, 65, 67, 84, 67, 84, 65, 67, 67, 65, 65, 67, 65, 71, 84, 84, 65, 65, 71, 71, 65, 71, 71, 71, 67, 65, 65, 67, 65, 65, 71, 65, 84, 67, 71, 65, 84, 71, 71, 71, 71, 67, 65, 84, 67, 71, 84, 65, 84, 65, 65, 67, 84, 84, 71, 65, 65, 84, 71, 67, 67, 65, 84, 84, 71, 71, 65, 67, 65, 65, 65, 84, 65, 65, 84, 65, 65, 84, 84, 71, 67, 84, 67, 67, 65, 67, 71, 84, 67, 67, 67, 65, 65, 65, 67, 67, 84, 67, 65, 65], mateReferenceIndex:-1, flags:16, alignmentEnd:19563, validationStringency:SILENT, SAMFlags:[READ_REVERSE_STRAND], cigar:100M, cigarLength:1, fileSource:null, readNegativeStrandFlag:true, attributesBinarySize:37, indexingBin:4682, mateReferenceName:*, SAMString:ctgA_19043_19563_1:0:0_1:0:0_18a0 16 ctgA 19464 37 100TACTCTACCAACAGTTAAGGAGGGCAACAAGATCGATGGGGCATCGTATAACTTGAATGCCATTGGACAAATAATAATTGCTCCACGTCCCAAACCTCAA 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 X0:i:1 X1:i:0 MD:Z:31T68 XG:i:0 NM:i:1 XM:i:1 XO:i:0 XT:A:U
, referenceIndex:0, alignmentStart:19464, supplementaryAlignmentFlag:false, originalBaseQualities:null]
BAM/LazyFeature.js has the remaining mapping components out of the tags and values.
Also, the tag values for a samrecord look like this:
SAMRecord tag: X0 -> 1
SAMRecord tag: X1 -> 0
SAMRecord tag: MD -> 31C0C5G61
SAMRecord tag: XG -> 0
SAMRecord tag: NM -> 3
SAMRecord tag: XM -> 3
SAMRecord tag: XO -> 0
SAMRecord tag: XT -> U
Test instance: ctgA_19043_19563_1:0:0_1:0:0_18a0
at ctgA:19281..19910 (631 b)
BAM is returned with file
and byte
fields as well. I think those are already being absorbed by the API and are just used for reading.
Its unclear how the cigar strings are failing. They should be returning things of the nature: 100M
. The BAM / File, LazyFeature.js, etc. is all store related data that we are ignoring anyway in favor of just alignments. It seems like somehow the subfeatures are just not being passed in properly during the parsing process.
We need to emulate this code to generate the subfeatures properly:
_cigarToSubfeats: function(cigar) {
var subfeats = [];
var min = this._get('start');
var max;
var ops = this._parseCigar( cigar );
for (var i = 0; i < ops.length; i++) {
var lop = ops[i][1];
var op = ops[i][0]; // operation type
// converting "=" to "E" to avoid possible problems later with non-alphanumeric type name
if (op === "=") { op = "E"; }
switch (op) {
case 'M':
case 'D':
case 'N':
case 'E':
case 'X':
max = min + lop;
break;
case 'I':
max = min;
break;
case 'P': // not showing padding deletions (possibly change this later -- could treat same as 'I' ?? )
case 'H': // not showing hard clipping (since it's unaligned, and offset arg meant to be beginning of aligned part)
case 'S': // not showing soft clipping (since it's unaligned, and offset arg meant to be beginning of aligned part)
break;
// other possible cases
}
if( op !== 'N' ) {
subfeats.push(
new SimpleFeature(
{
data: {
type: op,
start: min,
end: max,
strand: this._get('strand'),
cigar_op: lop+op
},
parent: this
})
);
}
min = max;
}
return subfeats;
}
For the mismatch handling, its a bit interseting. Its reinterpreting the cigarString and the MD string, elements which should have already been re-interpreted.
Ultimately, what we want is something that draws the subclass spans with the classes, alignmismatch mismatch base[T|A|G|C]
Mismatch rendered as class:
The render error (not getting now, though):
Closing in favor of #1826
Doing this already with the REST API using the htseq API from BROAD:
[ ] project mismatches correctly as subfeatures (or correclty populate another string?)
[ ] handle MD string mismatches correctly
[ ] project MD strings
[ ] automatically adjust histogram depth defaults to avoid "too much detail"
[ ] test other track types (should match 128k pair), etc. etc. etc.
[ ] replace strings with official SAMTag and CigarOperation's
[ ] add integration test for
ctgA_19043_19563_1:0:0_1:0:0_18a0
atctgA:19281..19910 (631 b)
[ ] enable caching
[ ] fix render range bug (below) if still exists
[ ] Template length?
== DONE ==
[x] populate the remaining track feature details (below)
[x] project cigar alignments properly
[x] trace through to make sure I have the cigar attribute populated and the md attribute populated
[x] test logged out
[x] populate alignments
[x] handle strand information correctly (should have minus-bam-read / plus-bam-read)
[x] handle mismatches
[x] over-ride BAM featuredetails to handle refSeq as JSON
===