Closed jkbonfield closed 9 years ago
Sounds like signed bytes in java. ... Found a not used TC_tagCount encoding key, probably from old times. Even though it had no effect the CHF tried to set some encoding params to it, failing because TC was typed as byte... fix d32e1125b7a8dbd86f8fe7dd9f06358d4b0ad649
On Fri, Apr 17, 2015 at 03:28:11AM -0700, Vadim Zalunin wrote:
Sounds like signed bytes in java. ... Found a not used TC_tagCount encoding key, probably from old times. Even though it had no effect the CHF tried to set some encoding params to it, failing because TC was typed as byte... fix d32e1125b7a8dbd86f8fe7dd9f06358d4b0ad649
Unfortunately the fix only partially works. It fixes the xx#large_aux.sam file correctly, but in doing so breaks compatibility with Cramtools 2.1 outputs:
jkb@seq3a[htslib/test] /software/bin/java -Xmx4000m -jar /nfs/users/nfs_j/jkb/work/cram/cramtools/cramtools-2.1.jar cram -R xx.fa -I xx#rg.sam -O _tmp.cram -n -Q --capture-all-tags
jkb@seq3a[htslib/test] /software/bin/java -Xmx4000m -jar /nfs/users/nfs_j/jkb/work/cram/cramtools/cramtools-3.0.jar bam -R xx.fa -I _tmp.cram -O _tmp.sam
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at net.sf.cram.CramTools.invoke(CramTools.java:93)
at net.sf.cram.CramTools.main(CramTools.java:123)
Caused by: java.lang.RuntimeException: Unknown encoding key: TC
at net.sf.cram.structure.CompressionHeader.read(CompressionHeader.java:173)
at net.sf.cram.structure.CompressionHeader.read(CompressionHeader.java:120)
at net.sf.cram.structure.CompressionHeaderBLock.
The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
of course, it creeped into the 2.1 files already... The exception can be ignored i guess, what does scramble do if it encounters an unknown encoding key?
On Fri, Apr 17, 2015 at 05:01:40AM -0700, Vadim Zalunin wrote:
of course, it creeped into the 2.1 files already... The exception can be ignored i guess, what does scramble do if it encounters an unknown encoding key?
It appears so yes! I agree it's probably fair to just ignore them. It's similar to having an extra auxiliary tag in SAM that you don't understand.
James
James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi.
The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
done: bd4c5fd202fe669d4b78837806f9fe581f070712
Sorry Vadim, it doesn't seem to work still:
jkb@seq3a[htslib/test] /software/bin/java -Xmx4000m -jar /nfs/users/nfs_j/jkb/work/cram/cramtools/cramtools-2.1.jar cram -R xx.fa -I xx#rg.sam -O _tmp.cram -n -Q --capture-all-tags
jkb@seq3a[htslib/test] /software/bin/java -Xmx4000m -jar /nfs/users/nfs_j/jkb/work/cram/cramtools/cramtools-3.0.jar bam -R xx.fa -I _tmp.cram
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at net.sf.cram.CramTools.invoke(CramTools.java:93)
at net.sf.cram.CramTools.main(CramTools.java:123)
Caused by: net.sf.samtools.SAMFormatException: Unrecognized tag type:
at net.sf.samtools.BinaryTagCodec.readSingleValue(BinaryTagCodec.java:403)
at net.sf.samtools.BinaryTagCodec.readTags(BinaryTagCodec.java:325)
at net.sf.cram.structure.SliceIO.parseSliceHeaderBlock(SliceIO.java:59)
at net.sf.cram.structure.SliceIO.readSliceHeadBlock(SliceIO.java:39)
at net.sf.cram.build.CramIO.readContainer(CramIO.java:467)
at net.sf.cram.build.CramIO.readContainer(CramIO.java:424)
at net.sf.cram.build.CramIO.readContainer(CramIO.java:245)
at net.sf.cram.Cram2Bam.main(Cram2Bam.java:213)
... 6 more
If it's any commiseration, Rob's turn the bug-hose onto io_lib with a fuzz tester and is identifying lots of crashes!
seems to work for xx#rg.full.sam, is xx#rg.sam any different?
ah, you mean between versions...
fixed in cram3 branch. I think there is a bug in 2.1 which causes 3 extra bytes in the slice header, looks like int vs itf8.
Ah sorry that was a cut and paste fail! Yes it works in version 3. I'll close this then. Thanks.
The htslib/test/xx#large_aux.sam has some sequences containing lots of auxiliary tags. If I cut it down to just the first read with 129 tags it fails, but at 128 it passes.
An example SAM file with 129 ??:i:1 entries is:
This is strictly an encoder issue as cramtools can decode the data generated from htslib.