CampagneLaboratory / goby

Goby framework and tools for analysis of high-throughput sequencing data
Other
17 stars 2 forks source link

err about sam-to-compact #6

Open dancju opened 8 years ago

dancju commented 8 years ago

Here is the command:

$ ./goby 1g sam-to-compact -i ../data/seq/SRR001000.sorted.sam -o ./SRR001000.goby --sorted

Here is the output:

java -ea -Xmx1g -Dlog4j.configuration=file:///Users/dan/Desktop/bio/goby/config/log4j-sample.properties -Djava.library.path= -jar /Users/dan/Desktop/bio/goby/goby.jar --mode sam-to-compact -i ../data/seq/SRR001000.sorted.sam -o ./SRR001000.goby --sorted
Store read origin: true
WARN  MessageChunksWriter  - Using chunk-size=10000
SAM/BAM file/input appear to have no target sequences. If reading from stdin, please check you are feeding this mode actual SAM/BAM content and that the header of the SAM file is included.
fac2003 commented 8 years ago

Could you check that the SAM files includes headers at the top? If it does, it would help if you provided some details, i.e., a small file sufficient to reproduce the problem. If it does not, you need to add headers at the start of the file. See http://www.htslib.org/doc/samtools.html to learn how to add a header.

dancju commented 8 years ago

It does not. And I think a sorted SAM file is supposed to contain no header.

dancju commented 8 years ago

I tried sam-to-compact on a unsorted BAM file.

$ ./goby 1g sam-to-compact -i ../data/seq/SRR001000.bam -o ./SRR001000.goby
Trying to set RJAVA_HOME environment variable

RJAVA_HOME=
java -ea -Xmx1g -Dlog4j.configuration=file:///Users/dan/Desktop/bio/goby/config/log4j-sample.properties -Djava.library.path= -jar /Users/dan/Desktop/bio/goby/goby.jar --mode sam-to-compact -i ../data/seq/SRR001000.bam -o ./SRR001000.goby
Store read origin: true
WARN  MessageChunksWriter  - Using chunk-size=10000
java.lang.StringIndexOutOfBoundsException: String index out of range: 36
    at java.lang.String.substring(String.java:1963)
    at edu.cornell.med.icb.goby.readers.sam.SamRecordParser.processRead(SamRecordParser.java:248)
    at edu.cornell.med.icb.goby.modes.SAMToCompactMode.scan(SAMToCompactMode.java:327)
    at edu.cornell.med.icb.goby.modes.SAMToCompactMode.execute(SAMToCompactMode.java:237)
    at edu.cornell.med.icb.goby.modes.GenericToolsDriver.execute(GenericToolsDriver.java:219)
    at edu.cornell.med.icb.goby.modes.GobyDriver.main(GobyDriver.java:55)
fac2003 commented 8 years ago

I think let's continue with the SAM file because the message is clear and it's easier to look at a text file for you to understand what is going on. If the SAM file you provided has no header, and Goby tells you in its error message that a header is required, it looks to me that the path forward is to add a header.

File format conversion cannot proceed without the header because it contains reference target/chromosome names and target lengths, which goby stores and uses in many ways. Do you know how to create a SAM header?

fac2003 commented 8 years ago

If you need help, it would also be useful if you posted complete reports: exactly what are you doing, including the command line and short examples of files. The exception you posted could be caused by an incorrectly specified reference genome argument, but I have no way of telling from if I can't see the command line and know what the files contain.

dancju commented 8 years ago

first lines of SRR001000.sam

@SQ SN:chr1 LN:248956422
@SQ SN:chr10    LN:133797422
@SQ SN:chr11    LN:135086622
@SQ SN:chr11_KI270721v1_random  LN:100316
@SQ SN:chr12    LN:133275309
@SQ SN:chr13    LN:114364328
@SQ SN:chr14    LN:107043718
@SQ SN:chr14_GL000009v2_random  LN:201709
@SQ SN:chr14_GL000225v1_random  LN:211173
@SQ SN:chr14_KI270722v1_random  LN:194050
@SQ SN:chr14_GL000194v1_random  LN:191469
@SQ SN:chr14_KI270723v1_random  LN:38115
@SQ SN:chr14_KI270724v1_random  LN:39555
@SQ SN:chr14_KI270725v1_random  LN:172810
@SQ SN:chr14_KI270726v1_random  LN:43739
@SQ SN:chr15    LN:101991189
@SQ SN:chr15_KI270727v1_random  LN:448248
@SQ SN:chr16    LN:90338345
@SQ SN:chr16_KI270728v1_random  LN:1872759
@SQ SN:chr17    LN:83257441
@SQ SN:chr17_GL000205v2_random  LN:185591
@SQ SN:chr17_KI270729v1_random  LN:280839
@SQ SN:chr17_KI270730v1_random  LN:112551
@SQ SN:chr18    LN:80373285
@SQ SN:chr19    LN:58617616
@SQ SN:chr1_KI270706v1_random   LN:175055
@SQ SN:chr1_KI270707v1_random   LN:32032
@SQ SN:chr1_KI270708v1_random   LN:127682
@SQ SN:chr1_KI270709v1_random   LN:66860
@SQ SN:chr1_KI270710v1_random   LN:40176
@SQ SN:chr1_KI270711v1_random   LN:42210
@SQ SN:chr1_KI270712v1_random   LN:176043
@SQ SN:chr1_KI270713v1_random   LN:40745
@SQ SN:chr1_KI270714v1_random   LN:41717
@SQ SN:chr2 LN:242193529
@SQ SN:chr20    LN:64444167
@SQ SN:chr21    LN:46709983
@SQ SN:chr22    LN:50818468
@SQ SN:chr22_KI270731v1_random  LN:150754
@SQ SN:chr22_KI270732v1_random  LN:41543
@SQ SN:chr22_KI270733v1_random  LN:179772
@SQ SN:chr22_KI270734v1_random  LN:165050
@SQ SN:chr22_KI270735v1_random  LN:42811
@SQ SN:chr22_KI270736v1_random  LN:181920
@SQ SN:chr22_KI270737v1_random  LN:103838
@SQ SN:chr22_KI270738v1_random  LN:99375
@SQ SN:chr22_KI270739v1_random  LN:73985
@SQ SN:chr2_KI270715v1_random   LN:161471
@SQ SN:chr2_KI270716v1_random   LN:153799
@SQ SN:chr3 LN:198295559
@SQ SN:chr3_GL000221v1_random   LN:155397
@SQ SN:chr4 LN:190214555
@SQ SN:chr4_GL000008v2_random   LN:209709
@SQ SN:chr5 LN:181538259
@SQ SN:chr5_GL000208v1_random   LN:92689
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chr9_KI270717v1_random   LN:40062
@SQ SN:chr9_KI270718v1_random   LN:38054
@SQ SN:chr9_KI270719v1_random   LN:176845
@SQ SN:chr9_KI270720v1_random   LN:39050
@SQ SN:chr1_KI270762v1_alt  LN:354444
@SQ SN:chr1_KI270766v1_alt  LN:256271
@SQ SN:chr1_KI270760v1_alt  LN:109528
@SQ SN:chr1_KI270765v1_alt  LN:185285
@SQ SN:chr1_GL383518v1_alt  LN:182439
@SQ SN:chr1_GL383519v1_alt  LN:110268
@SQ SN:chr1_GL383520v2_alt  LN:366580
@SQ SN:chr1_KI270764v1_alt  LN:50258
@SQ SN:chr1_KI270763v1_alt  LN:911658
@SQ SN:chr1_KI270759v1_alt  LN:425601
@SQ SN:chr1_KI270761v1_alt  LN:165834
@SQ SN:chr2_KI270770v1_alt  LN:136240
@SQ SN:chr2_KI270773v1_alt  LN:70887
@SQ SN:chr2_KI270774v1_alt  LN:223625
@SQ SN:chr2_KI270769v1_alt  LN:120616
@SQ SN:chr2_GL383521v1_alt  LN:143390
@SQ SN:chr2_KI270772v1_alt  LN:133041
@SQ SN:chr2_KI270775v1_alt  LN:138019
@SQ SN:chr2_KI270771v1_alt  LN:110395
@SQ SN:chr2_KI270768v1_alt  LN:110099
@SQ SN:chr2_GL582966v2_alt  LN:96131
@SQ SN:chr2_GL383522v1_alt  LN:123821
@SQ SN:chr2_KI270776v1_alt  LN:174166
@SQ SN:chr2_KI270767v1_alt  LN:161578
@SQ SN:chr3_JH636055v2_alt  LN:173151
@SQ SN:chr3_KI270783v1_alt  LN:109187
@SQ SN:chr3_KI270780v1_alt  LN:224108
@SQ SN:chr3_GL383526v1_alt  LN:180671
@SQ SN:chr3_KI270777v1_alt  LN:173649
@SQ SN:chr3_KI270778v1_alt  LN:248252
@SQ SN:chr3_KI270781v1_alt  LN:113034
@SQ SN:chr3_KI270779v1_alt  LN:205312
@SQ SN:chr3_KI270782v1_alt  LN:162429
@SQ SN:chr3_KI270784v1_alt  LN:184404
@SQ SN:chr4_KI270790v1_alt  LN:220246
@SQ SN:chr4_GL383528v1_alt  LN:376187
@SQ SN:chr4_KI270787v1_alt  LN:111943
@SQ SN:chr4_GL000257v2_alt  LN:586476
@SQ SN:chr4_KI270788v1_alt  LN:158965
@SQ SN:chr4_GL383527v1_alt  LN:164536
@SQ SN:chr4_KI270785v1_alt  LN:119912
@SQ SN:chr4_KI270789v1_alt  LN:205944
@SQ SN:chr4_KI270786v1_alt  LN:244096
@SQ SN:chr5_KI270793v1_alt  LN:126136
@SQ SN:chr5_KI270792v1_alt  LN:179043
@SQ SN:chr5_KI270791v1_alt  LN:195710
@SQ SN:chr5_GL383532v1_alt  LN:82728
@SQ SN:chr5_GL949742v1_alt  LN:226852
@SQ SN:chr5_KI270794v1_alt  LN:164558
@SQ SN:chr5_GL339449v2_alt  LN:1612928
@SQ SN:chr5_GL383530v1_alt  LN:101241
@SQ SN:chr5_KI270796v1_alt  LN:172708
@SQ SN:chr5_GL383531v1_alt  LN:173459
@SQ SN:chr5_KI270795v1_alt  LN:131892
@SQ SN:chr6_GL000250v2_alt  LN:4672374
@SQ SN:chr6_KI270800v1_alt  LN:175808
@SQ SN:chr6_KI270799v1_alt  LN:152148
@SQ SN:chr6_GL383533v1_alt  LN:124736
@SQ SN:chr6_KI270801v1_alt  LN:870480
@SQ SN:chr6_KI270802v1_alt  LN:75005
@SQ SN:chr6_KB021644v2_alt  LN:185823
@SQ SN:chr6_KI270797v1_alt  LN:197536
@SQ SN:chr6_KI270798v1_alt  LN:271782
@SQ SN:chr7_KI270804v1_alt  LN:157952
@SQ SN:chr7_KI270809v1_alt  LN:209586
@SQ SN:chr7_KI270806v1_alt  LN:158166
@SQ SN:chr7_GL383534v2_alt  LN:119183
@SQ SN:chr7_KI270803v1_alt  LN:1111570
@SQ SN:chr7_KI270808v1_alt  LN:271455
@SQ SN:chr7_KI270807v1_alt  LN:126434
@SQ SN:chr7_KI270805v1_alt  LN:209988
@SQ SN:chr8_KI270818v1_alt  LN:145606
@SQ SN:chr8_KI270812v1_alt  LN:282736
@SQ SN:chr8_KI270811v1_alt  LN:292436
@SQ SN:chr8_KI270821v1_alt  LN:985506
@SQ SN:chr8_KI270813v1_alt  LN:300230
@SQ SN:chr8_KI270822v1_alt  LN:624492
@SQ SN:chr8_KI270814v1_alt  LN:141812
@SQ SN:chr8_KI270810v1_alt  LN:374415
@SQ SN:chr8_KI270819v1_alt  LN:133535
@SQ SN:chr8_KI270820v1_alt  LN:36640
@SQ SN:chr8_KI270817v1_alt  LN:158983
@SQ SN:chr8_KI270816v1_alt  LN:305841
@SQ SN:chr8_KI270815v1_alt  LN:132244
@SQ SN:chr9_GL383539v1_alt  LN:162988
@SQ SN:chr9_GL383540v1_alt  LN:71551
@SQ SN:chr9_GL383541v1_alt  LN:171286
@SQ SN:chr9_GL383542v1_alt  LN:60032
@SQ SN:chr9_KI270823v1_alt  LN:439082
@SQ SN:chr10_GL383545v1_alt LN:179254
@SQ SN:chr10_KI270824v1_alt LN:181496
@SQ SN:chr10_GL383546v1_alt LN:309802
@SQ SN:chr10_KI270825v1_alt LN:188315
@SQ SN:chr11_KI270832v1_alt LN:210133
@SQ SN:chr11_KI270830v1_alt LN:177092
@SQ SN:chr11_KI270831v1_alt LN:296895
@SQ SN:chr11_KI270829v1_alt LN:204059
@SQ SN:chr11_GL383547v1_alt LN:154407
@SQ SN:chr11_JH159136v1_alt LN:200998
@SQ SN:chr11_JH159137v1_alt LN:191409
@SQ SN:chr11_KI270827v1_alt LN:67707
@SQ SN:chr11_KI270826v1_alt LN:186169
@SQ SN:chr12_GL877875v1_alt LN:167313
@SQ SN:chr12_GL877876v1_alt LN:408271
@SQ SN:chr12_KI270837v1_alt LN:40090
@SQ SN:chr12_GL383549v1_alt LN:120804
@SQ SN:chr12_KI270835v1_alt LN:238139
@SQ SN:chr12_GL383550v2_alt LN:169178
@SQ SN:chr12_GL383552v1_alt LN:138655
@SQ SN:chr12_GL383553v2_alt LN:152874
@SQ SN:chr12_KI270834v1_alt LN:119498
@SQ SN:chr12_GL383551v1_alt LN:184319
@SQ SN:chr12_KI270833v1_alt LN:76061
@SQ SN:chr12_KI270836v1_alt LN:56134
@SQ SN:chr13_KI270840v1_alt LN:191684
@SQ SN:chr13_KI270839v1_alt LN:180306
@SQ SN:chr13_KI270843v1_alt LN:103832
@SQ SN:chr13_KI270841v1_alt LN:169134
@SQ SN:chr13_KI270838v1_alt LN:306913
@SQ SN:chr13_KI270842v1_alt LN:37287
@SQ SN:chr14_KI270844v1_alt LN:322166
@SQ SN:chr14_KI270847v1_alt LN:1511111
@SQ SN:chr14_KI270845v1_alt LN:180703
@SQ SN:chr14_KI270846v1_alt LN:1351393
@SQ SN:chr15_KI270852v1_alt LN:478999
@SQ SN:chr15_KI270851v1_alt LN:263054
@SQ SN:chr15_KI270848v1_alt LN:327382
@SQ SN:chr15_GL383554v1_alt LN:296527
@SQ SN:chr15_KI270849v1_alt LN:244917
@SQ SN:chr15_GL383555v2_alt LN:388773
@SQ SN:chr15_KI270850v1_alt LN:430880
@SQ SN:chr16_KI270854v1_alt LN:134193
@SQ SN:chr16_KI270856v1_alt LN:63982
@SQ SN:chr16_KI270855v1_alt LN:232857
@SQ SN:chr16_KI270853v1_alt LN:2659700
@SQ SN:chr16_GL383556v1_alt LN:192462
@SQ SN:chr16_GL383557v1_alt LN:89672
@SQ SN:chr17_GL383563v3_alt LN:375691
@SQ SN:chr17_KI270862v1_alt LN:391357
@SQ SN:chr17_KI270861v1_alt LN:196688
@SQ SN:chr17_KI270857v1_alt LN:2877074
@SQ SN:chr17_JH159146v1_alt LN:278131
@SQ SN:chr17_JH159147v1_alt LN:70345
@SQ SN:chr17_GL383564v2_alt LN:133151
@SQ SN:chr17_GL000258v2_alt LN:1821992
@SQ SN:chr17_GL383565v1_alt LN:223995
@SQ SN:chr17_KI270858v1_alt LN:235827
@SQ SN:chr17_KI270859v1_alt LN:108763
@SQ SN:chr17_GL383566v1_alt LN:90219
@SQ SN:chr17_KI270860v1_alt LN:178921
@SQ SN:chr18_KI270864v1_alt LN:111737
@SQ SN:chr18_GL383567v1_alt LN:289831
@SQ SN:chr18_GL383570v1_alt LN:164789
@SQ SN:chr18_GL383571v1_alt LN:198278
@SQ SN:chr18_GL383568v1_alt LN:104552
@SQ SN:chr18_GL383569v1_alt LN:167950
@SQ SN:chr18_GL383572v1_alt LN:159547
@SQ SN:chr18_KI270863v1_alt LN:167999
@SQ SN:chr19_KI270868v1_alt LN:61734
@SQ SN:chr19_KI270865v1_alt LN:52969
@SQ SN:chr19_GL383573v1_alt LN:385657
@SQ SN:chr19_GL383575v2_alt LN:170222
@SQ SN:chr19_GL383576v1_alt LN:188024
@SQ SN:chr19_GL383574v1_alt LN:155864
@SQ SN:chr19_KI270866v1_alt LN:43156
@SQ SN:chr19_KI270867v1_alt LN:233762
@SQ SN:chr19_GL949746v1_alt LN:987716
@SQ SN:chr20_GL383577v2_alt LN:128386
@SQ SN:chr20_KI270869v1_alt LN:118774
@SQ SN:chr20_KI270871v1_alt LN:58661
@SQ SN:chr20_KI270870v1_alt LN:183433
@SQ SN:chr21_GL383578v2_alt LN:63917
@SQ SN:chr21_KI270874v1_alt LN:166743
@SQ SN:chr21_KI270873v1_alt LN:143900
@SQ SN:chr21_GL383579v2_alt LN:201197
@SQ SN:chr21_GL383580v2_alt LN:74653
@SQ SN:chr21_GL383581v2_alt LN:116689
@SQ SN:chr21_KI270872v1_alt LN:82692
@SQ SN:chr22_KI270875v1_alt LN:259914
@SQ SN:chr22_KI270878v1_alt LN:186262
@SQ SN:chr22_KI270879v1_alt LN:304135
@SQ SN:chr22_KI270876v1_alt LN:263666
@SQ SN:chr22_KI270877v1_alt LN:101331
@SQ SN:chr22_GL383583v2_alt LN:96924
@SQ SN:chr22_GL383582v2_alt LN:162811
@SQ SN:chrX_KI270880v1_alt  LN:284869
@SQ SN:chrX_KI270881v1_alt  LN:144206
@SQ SN:chr19_KI270882v1_alt LN:248807
@SQ SN:chr19_KI270883v1_alt LN:170399
@SQ SN:chr19_KI270884v1_alt LN:157053
@SQ SN:chr19_KI270885v1_alt LN:171027
@SQ SN:chr19_KI270886v1_alt LN:204239
@SQ SN:chr19_KI270887v1_alt LN:209512
@SQ SN:chr19_KI270888v1_alt LN:155532
@SQ SN:chr19_KI270889v1_alt LN:170698
@SQ SN:chr19_KI270890v1_alt LN:184499
@SQ SN:chr19_KI270891v1_alt LN:170680
@SQ SN:chr1_KI270892v1_alt  LN:162212
@SQ SN:chr2_KI270894v1_alt  LN:214158
@SQ SN:chr2_KI270893v1_alt  LN:161218
@SQ SN:chr3_KI270895v1_alt  LN:162896
@SQ SN:chr4_KI270896v1_alt  LN:378547
@SQ SN:chr5_KI270897v1_alt  LN:1144418
@SQ SN:chr5_KI270898v1_alt  LN:130957
@SQ SN:chr6_GL000251v2_alt  LN:4795265
@SQ SN:chr7_KI270899v1_alt  LN:190869
@SQ SN:chr8_KI270901v1_alt  LN:136959
@SQ SN:chr8_KI270900v1_alt  LN:318687
@SQ SN:chr11_KI270902v1_alt LN:106711
@SQ SN:chr11_KI270903v1_alt LN:214625
@SQ SN:chr12_KI270904v1_alt LN:572349
@SQ SN:chr15_KI270906v1_alt LN:196384
@SQ SN:chr15_KI270905v1_alt LN:5161414
@SQ SN:chr17_KI270907v1_alt LN:137721
@SQ SN:chr17_KI270910v1_alt LN:157099
@SQ SN:chr17_KI270909v1_alt LN:325800
@SQ SN:chr17_JH159148v1_alt LN:88070
@SQ SN:chr17_KI270908v1_alt LN:1423190
@SQ SN:chr18_KI270912v1_alt LN:174061
@SQ SN:chr18_KI270911v1_alt LN:157710
@SQ SN:chr19_GL949747v2_alt LN:729520
@SQ SN:chr22_KB663609v1_alt LN:74013
@SQ SN:chrX_KI270913v1_alt  LN:274009
@SQ SN:chr19_KI270914v1_alt LN:205194
@SQ SN:chr19_KI270915v1_alt LN:170665
@SQ SN:chr19_KI270916v1_alt LN:184516
@SQ SN:chr19_KI270917v1_alt LN:190932
@SQ SN:chr19_KI270918v1_alt LN:123111
@SQ SN:chr19_KI270919v1_alt LN:170701
@SQ SN:chr19_KI270920v1_alt LN:198005
@SQ SN:chr19_KI270921v1_alt LN:282224
@SQ SN:chr19_KI270922v1_alt LN:187935
@SQ SN:chr19_KI270923v1_alt LN:189352
@SQ SN:chr3_KI270924v1_alt  LN:166540
@SQ SN:chr4_KI270925v1_alt  LN:555799
@SQ SN:chr6_GL000252v2_alt  LN:4604811
@SQ SN:chr8_KI270926v1_alt  LN:229282
@SQ SN:chr11_KI270927v1_alt LN:218612
@SQ SN:chr19_GL949748v2_alt LN:1064304
@SQ SN:chr22_KI270928v1_alt LN:176103
@SQ SN:chr19_KI270929v1_alt LN:186203
@SQ SN:chr19_KI270930v1_alt LN:200773
@SQ SN:chr19_KI270931v1_alt LN:170148
@SQ SN:chr19_KI270932v1_alt LN:215732
@SQ SN:chr19_KI270933v1_alt LN:170537
@SQ SN:chr19_GL000209v2_alt LN:177381
@SQ SN:chr3_KI270934v1_alt  LN:163458
@SQ SN:chr6_GL000253v2_alt  LN:4677643
@SQ SN:chr19_GL949749v2_alt LN:1091841
@SQ SN:chr3_KI270935v1_alt  LN:197351
@SQ SN:chr6_GL000254v2_alt  LN:4827813
@SQ SN:chr19_GL949750v2_alt LN:1066390
@SQ SN:chr3_KI270936v1_alt  LN:164170
@SQ SN:chr6_GL000255v2_alt  LN:4606388
@SQ SN:chr19_GL949751v2_alt LN:1002683
@SQ SN:chr3_KI270937v1_alt  LN:165607
@SQ SN:chr6_GL000256v2_alt  LN:4929269
@SQ SN:chr19_GL949752v1_alt LN:987100
@SQ SN:chr6_KI270758v1_alt  LN:76752
@SQ SN:chr19_GL949753v2_alt LN:796479
@SQ SN:chr19_KI270938v1_alt LN:1066800
@SQ SN:chrM LN:16569
@SQ SN:chrUn_KI270302v1 LN:2274
@SQ SN:chrUn_KI270304v1 LN:2165
@SQ SN:chrUn_KI270303v1 LN:1942
@SQ SN:chrUn_KI270305v1 LN:1472
@SQ SN:chrUn_KI270322v1 LN:21476
@SQ SN:chrUn_KI270320v1 LN:4416
@SQ SN:chrUn_KI270310v1 LN:1201
@SQ SN:chrUn_KI270316v1 LN:1444
@SQ SN:chrUn_KI270315v1 LN:2276
@SQ SN:chrUn_KI270312v1 LN:998
@SQ SN:chrUn_KI270311v1 LN:12399
@SQ SN:chrUn_KI270317v1 LN:37690
@SQ SN:chrUn_KI270412v1 LN:1179
@SQ SN:chrUn_KI270411v1 LN:2646
@SQ SN:chrUn_KI270414v1 LN:2489
@SQ SN:chrUn_KI270419v1 LN:1029
@SQ SN:chrUn_KI270418v1 LN:2145
@SQ SN:chrUn_KI270420v1 LN:2321
@SQ SN:chrUn_KI270424v1 LN:2140
@SQ SN:chrUn_KI270417v1 LN:2043
@SQ SN:chrUn_KI270422v1 LN:1445
@SQ SN:chrUn_KI270423v1 LN:981
@SQ SN:chrUn_KI270425v1 LN:1884
@SQ SN:chrUn_KI270429v1 LN:1361
@SQ SN:chrUn_KI270442v1 LN:392061
@SQ SN:chrUn_KI270466v1 LN:1233
@SQ SN:chrUn_KI270465v1 LN:1774
@SQ SN:chrUn_KI270467v1 LN:3920
@SQ SN:chrUn_KI270435v1 LN:92983
@SQ SN:chrUn_KI270438v1 LN:112505
@SQ SN:chrUn_KI270468v1 LN:4055
@SQ SN:chrUn_KI270510v1 LN:2415
@SQ SN:chrUn_KI270509v1 LN:2318
@SQ SN:chrUn_KI270518v1 LN:2186
@SQ SN:chrUn_KI270508v1 LN:1951
@SQ SN:chrUn_KI270516v1 LN:1300
@SQ SN:chrUn_KI270512v1 LN:22689
@SQ SN:chrUn_KI270519v1 LN:138126
@SQ SN:chrUn_KI270522v1 LN:5674
@SQ SN:chrUn_KI270511v1 LN:8127
@SQ SN:chrUn_KI270515v1 LN:6361
@SQ SN:chrUn_KI270507v1 LN:5353
@SQ SN:chrUn_KI270517v1 LN:3253
@SQ SN:chrUn_KI270529v1 LN:1899
@SQ SN:chrUn_KI270528v1 LN:2983
@SQ SN:chrUn_KI270530v1 LN:2168
@SQ SN:chrUn_KI270539v1 LN:993
@SQ SN:chrUn_KI270538v1 LN:91309
@SQ SN:chrUn_KI270544v1 LN:1202
@SQ SN:chrUn_KI270548v1 LN:1599
@SQ SN:chrUn_KI270583v1 LN:1400
@SQ SN:chrUn_KI270587v1 LN:2969
@SQ SN:chrUn_KI270580v1 LN:1553
@SQ SN:chrUn_KI270581v1 LN:7046
@SQ SN:chrUn_KI270579v1 LN:31033
@SQ SN:chrUn_KI270589v1 LN:44474
@SQ SN:chrUn_KI270590v1 LN:4685
@SQ SN:chrUn_KI270584v1 LN:4513
@SQ SN:chrUn_KI270582v1 LN:6504
@SQ SN:chrUn_KI270588v1 LN:6158
@SQ SN:chrUn_KI270593v1 LN:3041
@SQ SN:chrUn_KI270591v1 LN:5796
@SQ SN:chrUn_KI270330v1 LN:1652
@SQ SN:chrUn_KI270329v1 LN:1040
@SQ SN:chrUn_KI270334v1 LN:1368
@SQ SN:chrUn_KI270333v1 LN:2699
@SQ SN:chrUn_KI270335v1 LN:1048
@SQ SN:chrUn_KI270338v1 LN:1428
@SQ SN:chrUn_KI270340v1 LN:1428
@SQ SN:chrUn_KI270336v1 LN:1026
@SQ SN:chrUn_KI270337v1 LN:1121
@SQ SN:chrUn_KI270363v1 LN:1803
@SQ SN:chrUn_KI270364v1 LN:2855
@SQ SN:chrUn_KI270362v1 LN:3530
@SQ SN:chrUn_KI270366v1 LN:8320
@SQ SN:chrUn_KI270378v1 LN:1048
@SQ SN:chrUn_KI270379v1 LN:1045
@SQ SN:chrUn_KI270389v1 LN:1298
@SQ SN:chrUn_KI270390v1 LN:2387
@SQ SN:chrUn_KI270387v1 LN:1537
@SQ SN:chrUn_KI270395v1 LN:1143
@SQ SN:chrUn_KI270396v1 LN:1880
@SQ SN:chrUn_KI270388v1 LN:1216
@SQ SN:chrUn_KI270394v1 LN:970
@SQ SN:chrUn_KI270386v1 LN:1788
@SQ SN:chrUn_KI270391v1 LN:1484
@SQ SN:chrUn_KI270383v1 LN:1750
@SQ SN:chrUn_KI270393v1 LN:1308
@SQ SN:chrUn_KI270384v1 LN:1658
@SQ SN:chrUn_KI270392v1 LN:971
@SQ SN:chrUn_KI270381v1 LN:1930
@SQ SN:chrUn_KI270385v1 LN:990
@SQ SN:chrUn_KI270382v1 LN:4215
@SQ SN:chrUn_KI270376v1 LN:1136
@SQ SN:chrUn_KI270374v1 LN:2656
@SQ SN:chrUn_KI270372v1 LN:1650
@SQ SN:chrUn_KI270373v1 LN:1451
@SQ SN:chrUn_KI270375v1 LN:2378
@SQ SN:chrUn_KI270371v1 LN:2805
@SQ SN:chrUn_KI270448v1 LN:7992
@SQ SN:chrUn_KI270521v1 LN:7642
@SQ SN:chrUn_GL000195v1 LN:182896
@SQ SN:chrUn_GL000219v1 LN:179198
@SQ SN:chrUn_GL000220v1 LN:161802
@SQ SN:chrUn_GL000224v1 LN:179693
@SQ SN:chrUn_KI270741v1 LN:157432
@SQ SN:chrUn_GL000226v1 LN:15008
@SQ SN:chrUn_GL000213v1 LN:164239
@SQ SN:chrUn_KI270743v1 LN:210658
@SQ SN:chrUn_KI270744v1 LN:168472
@SQ SN:chrUn_KI270745v1 LN:41891
@SQ SN:chrUn_KI270746v1 LN:66486
@SQ SN:chrUn_KI270747v1 LN:198735
@SQ SN:chrUn_KI270748v1 LN:93321
@SQ SN:chrUn_KI270749v1 LN:158759
@SQ SN:chrUn_KI270750v1 LN:148850
@SQ SN:chrUn_KI270751v1 LN:150742
@SQ SN:chrUn_KI270752v1 LN:27745
@SQ SN:chrUn_KI270753v1 LN:62944
@SQ SN:chrUn_KI270754v1 LN:40191
@SQ SN:chrUn_KI270755v1 LN:36723
@SQ SN:chrUn_KI270756v1 LN:79590
@SQ SN:chrUn_KI270757v1 LN:71251
@SQ SN:chrUn_GL000214v1 LN:137718
@SQ SN:chrUn_KI270742v1 LN:186739
@SQ SN:chrUn_GL000216v2 LN:176608
@SQ SN:chrUn_GL000218v1 LN:161147
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@SQ SN:chrY_KI270740v1_random   LN:37240
@PG ID:bwa  PN:bwa  VN:0.7.13-r1126 CL:bwa mem ../ref/hg38.fa SRR001000.fq
SRR001000.1 0   chr17   56844677    60  38S22M1I25M1D50M1I65M177S   *   0   0   TCAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTCTATTTGAACTTTATACAGGGTCCCGTAGAGATAAGTTTATGTTGCACATTAATGCTCCCACAAATAATGGTGAACACTAAACCTGTTAACAGTTTATAGAGTGTAGATATGTGTATAGAATTGATGTAACTTATTGAATCTCTCAGTCTTTATGTAGCTCTTAAGGTAAAACGAACTACTAATTAAAAAATTTTAATTAGGGTAAGTTTTAAGTTTTTACTTAAGTTTAGGTATATAACCAAAATAAATTAGTAACTTAATTTTAATTTTTAAAAATTTATTACCTTTTACTAAATTAAGGTTAGGTAACGGTAAGGGTTAGTAAGTTACTAACCCTTTACGG AAA0000000000000000000000000000000B?83200322800000::::888922.29>>===::<888=;===:;9762000896022262;;;;66=777777633....33......434;78:8611186666:=93437743........44.......2....4688666.......+....----633---------**3653-----******,,,,((**0008444;;;;;444)--)),,((30,,,,,,,-33000311--,,,(((,,),,,,,,,,,,,,,,,(((((,,,,,(()()),,,--//30(,,,,,)),,10,,,,,010,,,,,,,,,,),,,,,,,11,,,,,,,,,),, NM:i:5  MD:Z:47^A65G0A48    AS:i:131    XS:i:19 SA:Z:chr10,31970052,+,4S32M343S,0,0;
SRR001000.1 2048    chr10   31970052    0   4H32M343H   *   0   0   GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTC    000000000000000000000000000000B?    NM:i:0  MD:Z:32 AS:i:32 XS:i:32 SA:Z:chr17,56844677,+,38S22M1I25M1D50M1I65M177S,60,5;
SRR001000.2 4   *   0   0   *   *   0   0   TCAGGGGGGGGGGGGCTCTGTAACATTTGGTGGGACCTGATGCTGCTGGTGTGCTGTAAATAAGTGCCTAGCACATCACGTAGGCACCAGGTGTCACCAGGGCTACTTGCCTCGGCATCTCCTCACCGGAGAAGGGGTTAACAAACCCGTGGGGGTCTTAGTGGAAGTGACGTGCTGTGAATACAGGTCCATAGCACCGCTATCCACTATGTCTCGCCCGGGCTATATGTCGCCTTACCTCCCCTATATAGTCACGACCCCACCGAACCAGGC   FFFFFFFFFFFFF55IIFFFFFIIIIIIIIIGGGHHHHHIIIIIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBCCFFFFFFFFFFFFFFFFFFFGC55555CGCCCGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGGGEEEFEFEBB>>>BEEEEEEEFGGG@@@@A@@@@@@=766:7777?:::   AS:i:0  XS:i:0
SRR001000.3 4   *   0   0   *   *   0   0   TCAGGGGGGGGGGGGCTCTGTAACATTTGGTGGGACCTGATGCTGCTGGTGTGCTGTAAATAAGTGCCTAGCACATCACGTAGGCACCAGGTGTCACCAGGGCTACTTGCCTCGGCATCTCCTCACCGGAGAAGGGGTTAACAAACCCGTGGGGGTCTTAGTGGAAGTGACGTGCTGTGAATACAGGTCCATAGCACCGCTATCCACTATGTCTCGCCCGGGCTATATGTCGCCTTACCTCCCCTATATAGTCACGACCCCACCGAACCAGGC   FFC////////////=CFFFFFHHFHHHHHGGGGHHHHHHHHHIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGGFFFFGGGGGGFFFGAAFFF??000059GFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGGGEEEEEEEBBAAAEEEBBBBEEEEE@A@A@@@@4444766:=AAA@777   AS:i:0  XS:i:0

cmd

$ ./goby 1g sam-to-compact -i ../data/seq/SRR001000.sam -o ./SRR001000.goby

output

Trying to set RJAVA_HOME environment variable

RJAVA_HOME=
java -ea -Xmx1g -Dlog4j.configuration=file:///Users/dan/Desktop/bio/goby/config/log4j-sample.properties -Djava.library.path= -jar /Users/dan/Desktop/bio/goby/goby.jar --mode sam-to-compact -i ../data/seq/SRR001000.sam -o ./SRR001000.goby
Store read origin: true
WARN  MessageChunksWriter  - Using chunk-size=10000
java.lang.StringIndexOutOfBoundsException: String index out of range: 36
    at java.lang.String.substring(String.java:1963)
    at edu.cornell.med.icb.goby.readers.sam.SamRecordParser.processRead(SamRecordParser.java:248)
    at edu.cornell.med.icb.goby.modes.SAMToCompactMode.scan(SAMToCompactMode.java:327)
    at edu.cornell.med.icb.goby.modes.SAMToCompactMode.execute(SAMToCompactMode.java:237)
    at edu.cornell.med.icb.goby.modes.GenericToolsDriver.execute(GenericToolsDriver.java:219)
    at edu.cornell.med.icb.goby.modes.GobyDriver.main(GobyDriver.java:55)
dancju commented 8 years ago
$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
fac2003 commented 8 years ago

The header looks fine, except it is space delimited as far as I can tell from the GitHub formatting. The SAM spec indicates that the header must be tab delimited. Goby uses samtools to check if the header is present, if the file has spaces it won't be parsed correctly. You can use the following command to check for tabs:

od -c SRR001000.sam |head 0000000 @ S Q S N : c h r 1 L N : 2 0000020 4 8 9 5 6 4 2 2 \n @ S Q S N : 0000040 c h r 1 0 L N : 1 3 3 7 0000060 9 7 4 2 2 \n @ S Q S N : c h r 0000100 1 1 L N : 1 3 5 0 8 6 6 0000120 2 2 \n @ S Q S N : c h r 1 1 0000140 K I 2 7 0 7 2 1 v 1 r a n d o 0000160 m L N : 1 0 0 3 1 6 \n @ S Q 0000200 S N : c h r 1 2 L N : 0000220 1 3 3 2 7 5 3 0 9 \n @ S Q S N

A tab separator should appear as \t, not space and appear between @SQ and SN.

dancju commented 8 years ago

I am pretty sure it is tab delimited.

$ od -c SRR001000.sam | head
0000000    @   S   Q  \t   S   N   :   c   h   r   1  \t   L   N   :   2
0000020    4   8   9   5   6   4   2   2  \n   @   S   Q  \t   S   N   :
0000040    c   h   r   1   0  \t   L   N   :   1   3   3   7   9   7   4
0000060    2   2  \n   @   S   Q  \t   S   N   :   c   h   r   1   1  \t
0000100    L   N   :   1   3   5   0   8   6   6   2   2  \n   @   S   Q
0000120   \t   S   N   :   c   h   r   1   1   _   K   I   2   7   0   7
0000140    2   1   v   1   _   r   a   n   d   o   m  \t   L   N   :   1
0000160    0   0   3   1   6  \n   @   S   Q  \t   S   N   :   c   h   r
0000200    1   2  \t   L   N   :   1   3   3   2   7   5   3   0   9  \n
0000220    @   S   Q  \t   S   N   :   c   h   r   1   3  \t   L   N   :
fac2003 commented 8 years ago

Could you provide the exact file you use somewhere so that I can replicate this? I am upgrading Goby to htsjdk from a very old samtools and I would like to see if this helps with parsing your file.

dancju commented 8 years ago

Given my network condition is poor, the file is too huge to be transmitted. I am absolutely sure the file is syntactically valid since it is generated by SAMTOOLS.

The project has been ended, so I will not follow up this issue. Your help is truly appreciated.

fac2003 commented 8 years ago

I upgraded Goby to the latest HTSJDK (where samtools java implementation was used before). You can download a preview of the jar here: goby_htsjdk.jar Does this jar work better on your file?