GMOD / jbrowse

JBrowse 1, a full-featured genome browser built with JavaScript and HTML5. For JBrowse 2, see https://github.com/GMOD/jbrowse-components.
http://jbrowse.org
Other
463 stars 199 forks source link

Perl tests fail with perl version 5.18 #470

Closed cmdcolin closed 6 years ago

cmdcolin commented 10 years ago

Results of prove -I src/perl5 -lr tests/ on Ubuntu 14 with perl 5.18.2

``` tests/perl_tests/00.compile.t .............. ok tests/perl_tests/add-json.pl.t ............. ok # using temp dir /tmp/Yl3wANo8Si tests/perl_tests/bam-to-json.pl.t .......... ok # writing output to /tmp/Qs_enmpmW8 # Failed test 'got the right genes trackdata' # at tests/perl_tests/biodb-to-json.pl.t line 44. # Structures begin differing at: # $got->[3] = 'ctgA' # $expected->[3] = 'example' # [ # 0, # 1049, # 9000, # 1, # 'ctgA', # 'EDEN', # 'gene', # 'EDEN', # 'example', # 'protein kinase', # [ # [ # 1, # 1299, # 9000, # 1, # [ # [ # 2, # 2999, # 3300, # 1, # 'five_prime_UTR', # 'example', # 'EDEN.3', # 'ctgA' # ], # [ # 3, # 7600, # 9000, # 1, # 'three_prime_UTR', # 'example', # 'ctgA', # 'EDEN.3' # ], # [ # 4, # 3300, # 3902, # 1, # 'example', # 'CDS', # 'ctgA', # 'EDEN.3', # '0' # ], # [ # 5, # 1299, # 1500, # 1, # 'five_prime_UTR', # 'ctgA', # 'EDEN.3', # 'example' # ], # [ # 6, # 6999, # 7600, # 1, # '1', # 'EDEN.3', # 'ctgA', # 'CDS', # 'example' # ], # [ # 7, # 4999, # 5500, # 1, # 'example', # 'CDS', # '1', # 'ctgA', # 'EDEN.3' # ] # ], # 'Eden splice form 3', # 'example', # 'EDEN.3', # 'mRNA', # 'EDEN', # 'ctgA', # 'EDEN.3' # ], # [ # 8, # 1049, # 9000, # 1, # 'example', # 'EDEN.2', # [ # [ # 9, # 7608, # 9000, # 1, # 'three_prime_UTR', # 'EDEN.2', # 'ctgA', # 'example' # ], # [ # 10, # 6999, # 7608, # 1, # '0', # 'EDEN.2', # 'ctgA', # 'CDS', # 'example' # ], # [ # 10, # 1200, # 1500, # 1, # '0', # 'EDEN.2', # 'ctgA', # 'CDS', # 'example' # ], # [ # 11, # 1049, # 1200, # 1, # 'example', # 'ctgA', # 'EDEN.2', # 'five_prime_UTR' # ], # [ # 10, # 4999, # 5500, # 1, # '0', # 'EDEN.2', # 'ctgA', # 'CDS', # 'example' # ] # ], # 'Eden splice form 2', # 'EDEN.2', # 'EDEN', # 'ctgA', # 'mRNA' # ], # [ # 12, # 1049, # 9000, # 1, # 'Eden splice form 1', # [ # [ # 13, # 1200, # 1500, # 1, # 'example', # 'CDS', # 'EDEN.1', # 'ctgA', # '0' # ], # [ # 4, # 4999, # 5500, # 1, # 'example', # 'CDS', # 'ctgA', # 'EDEN.1', # '0' # ], # [ # 14, # 2999, # 3902, # 1, # 'example', # 'CDS', # 'EDEN.1', # 'ctgA', # '0' # ], # [ # 9, # 7608, # 9000, # 1, # 'three_prime_UTR', # 'EDEN.1', # 'ctgA', # 'example' # ], # [ # 15, # 1049, # 1200, # 1, # 'ctgA', # 'EDEN.1', # 'example', # 'five_prime_UTR' # ], # [ # 16, # 6999, # 7608, # 1, # 'example', # 'CDS', # '0', # 'EDEN.1', # 'ctgA' # ] # ], # 'EDEN.1', # 'example', # 'mRNA', # 'EDEN.1', # 'ctgA', # 'EDEN' # ] # ] # ] # writing output to /tmp/ygsDVImhob # Failed test 'got the right genes trackdata' # at tests/perl_tests/biodb-to-json.pl.t line 111. # Structures begin differing at: # $got->[3] = 'gene' # $expected->[3] = 'example' # [ # 0, # 1049, # 9000, # 1, # 'gene', # 'ctgA', # 'EDEN', # [ # [ # 1, # 1049, # 9000, # 1, # 'mRNA', # 'EDEN.1', # 'ctgA', # 'EDEN', # 'Eden splice form 1', # [ # [ # 2, # 2999, # 3902, # 1, # 'example', # 'CDS', # 'ctgA', # 'EDEN.1', # '0' # ], # [ # 3, # 4999, # 5500, # 1, # 'example', # 'CDS', # 'EDEN.1', # 'ctgA', # '0' # ], # [ # 4, # 1200, # 1500, # 1, # 'example', # '0', # 'ctgA', # 'EDEN.1', # 'CDS' # ], # [ # 5, # 6999, # 7608, # 1, # '0', # 'EDEN.1', # 'ctgA', # 'CDS', # 'example' # ], # [ # 6, # 1049, # 1200, # 1, # 'example', # 'ctgA', # 'EDEN.1', # 'five_prime_UTR' # ], # [ # 7, # 7608, # 9000, # 1, # 'three_prime_UTR', # 'example', # 'EDEN.1', # 'ctgA' # ] # ], # 'EDEN.1', # 'example' # ], # [ # 8, # 1049, # 9000, # 1, # 'mRNA', # 'EDEN', # 'ctgA', # 'EDEN.2', # [ # [ # 9, # 7608, # 9000, # 1, # 'EDEN.2', # 'ctgA', # 'example', # 'three_prime_UTR' # ], # [ # 10, # 6999, # 7608, # 1, # 'example', # '0', # 'EDEN.2', # 'ctgA', # 'CDS' # ], # [ # 11, # 1200, # 1500, # 1, # 'CDS', # '0', # 'EDEN.2', # 'ctgA', # 'example' # ], # [ # 7, # 1049, # 1200, # 1, # 'five_prime_UTR', # 'example', # 'EDEN.2', # 'ctgA' # ], # [ # 4, # 4999, # 5500, # 1, # 'example', # '0', # 'ctgA', # 'EDEN.2', # 'CDS' # ] # ], # 'Eden splice form 2', # 'example', # 'EDEN.2' # ], # [ # 12, # 1299, # 9000, # 1, # 'EDEN', # 'ctgA', # 'EDEN.3', # 'mRNA', # 'example', # 'EDEN.3', # [ # [ # 13, # 3300, # 3902, # 1, # 'ctgA', # 'EDEN.3', # '0', # 'CDS', # 'example' # ], # [ # 14, # 2999, # 3300, # 1, # 'example', # 'EDEN.3', # 'ctgA', # 'five_prime_UTR' # ], # [ # 15, # 7600, # 9000, # 1, # 'three_prime_UTR', # 'example', # 'ctgA', # 'EDEN.3' # ], # [ # 10, # 6999, # 7600, # 1, # 'example', # '1', # 'EDEN.3', # 'ctgA', # 'CDS' # ], # [ # 9, # 1299, # 1500, # 1, # 'EDEN.3', # 'ctgA', # 'example', # 'five_prime_UTR' # ], # [ # 16, # 4999, # 5500, # 1, # 'CDS', # 'ctgA', # 'EDEN.3', # '1', # 'example' # ] # ], # 'Eden splice form 3' # ] # ], # 'protein kinase', # 'example', # 'EDEN' # ] # Looks like you failed 2 tests of 15. tests/perl_tests/biodb-to-json.pl.t ........ Dubious, test returned 2 (wstat 512, 0x200) Failed 2/15 subtests tests/perl_tests/conf_format.t ............. ok # using temp dir /tmp/4fvuNkBf6A # using temp dir /tmp/J3HXHaH5XA tests/perl_tests/draw-basepair-track.pl.t .. ok tests/perl_tests/fakefasta.t ............... ok tests/perl_tests/featurestream.t ........... ok # Failed test 'exonerate mRNA has its subfeatures' # at tests/perl_tests/flatfile-to-json.pl.t line 95. # got: '' # expected: 'ARRAY' # { # 'featureCount' => 2, # 'formatVersion' => 1, # 'histograms' => { # 'meta' => [ # { # 'arrayParams' => { # 'chunkSize' => 10000, # 'length' => 1, # 'urlTemplate' => 'hist-50000-{Chunk}.jsonz' # }, # 'basesPerBin' => '50000' # } # ], # 'stats' => [ # { # 'basesPerBin' => '50000', # 'max' => 2, # 'mean' => 2 # } # ] # }, # 'intervals' => { # 'classes' => [ # { # 'attributes' => [ # 'Start', # 'End', # 'Strand', # 'Id', # 'Subfeatures', # 'Note', # 'Source', # 'Name', # 'Type', # 'Phase', # 'Seq_id' # ], # 'isArrayAttr' => { # 'Subfeatures' => 1 # } # }, # { # 'attributes' => [ # 'Start', # 'End', # 'Strand', # 'Seq_id', # 'Type', # 'Phase', # 'Source' # ], # 'isArrayAttr' => {} # }, # { # 'attributes' => [ # 'Start', # 'End', # 'Strand', # 'Seq_id', # 'Phase', # 'Type', # 'Source' # ], # 'isArrayAttr' => {} # }, # { # 'attributes' => [ # 'Start', # 'End', # 'Strand', # 'Source', # 'Phase', # 'Type', # 'Seq_id' # ], # 'isArrayAttr' => {} # }, # { # 'attributes' => [ # 'Start', # 'End', # 'Strand', # 'Subfeatures', # 'Id', # 'Seq_id', # 'Type', # 'Note', # 'Source', # 'Name' # ], # 'isArrayAttr' => { # 'Subfeatures' => 1 # } # }, # { # 'attributes' => [ # 'Start', # 'End', # 'Strand', # 'Source', # 'Seq_id', # 'Type' # ], # 'isArrayAttr' => {} # }, # { # 'attributes' => [ # 'Start', # 'End', # 'Strand', # 'Source', # 'Phase', # 'Type', # 'Seq_id' # ], # 'isArrayAttr' => {} # }, # { # 'attributes' => [ # 'Start', # 'End', # 'Strand', # 'Phase', # 'Type', # 'Seq_id', # 'Source' # ], # 'isArrayAttr' => {} # }, # { # 'attributes' => [ # 'Start', # 'End', # 'Strand', # 'Source', # 'Phase', # 'Type', # 'Seq_id' # ], # 'isArrayAttr' => {} # }, # { # 'attributes' => [ # 'Start', # 'End', # 'Strand', # 'Type', # 'Seq_id', # 'Source' # ], # 'isArrayAttr' => {} # }, # { # 'attributes' => [ # 'Start', # 'End', # 'Chunk' # ], # 'isArrayAttr' => { # 'Sublist' => 1 # } # } # ], # 'count' => 2, # 'lazyClass' => 10, # 'maxEnd' => 23000, # 'minStart' => 12999, # 'nclist' => [ # [ # 0, # 12999, # 17200, # 1, # 'cds-Apple2', # [ # [ # 1, # 13499, # 13800, # 1, # 'ctgA', # 'CDS', # 0, # 'predicted' # ], # [ # 2, # 14999, # 15500, # 1, # 'ctgA', # 1, # 'CDS', # 'predicted' # ], # [ # 3, # 16499, # 17000, # 1, # 'predicted', # 2, # 'CDS', # 'ctgA' # ] # ], # 'mRNA with CDSs but no UTRs', # 'predicted', # 'Apple2', # 'mRNA', # 0, # 'ctgA' # ], # [ # 4, # 17399, # 23000, # 1, # [ # [ # 5, # 17399, # 17999, # 1, # 'exonerate', # 'ctgA', # 'UTR' # ], # [ # 6, # 17999, # 18800, # 1, # 'exonerate', # 0, # 'CDS', # 'ctgA' # ], # [ # 7, # 18999, # 19500, # 1, # 1, # 'CDS', # 'ctgA', # 'exonerate' # ], # [ # 8, # 20999, # 21200, # 1, # 'exonerate', # 2, # 'CDS', # 'ctgA' # ], # [ # 9, # 21200, # 23000, # 1, # 'UTR', # 'ctgA', # 'exonerate' # ] # ], # 'rna-Apple3', # 'ctgA', # 'mRNA', # 'mRNA with both CDSs and UTRs', # 'exonerate', # 'Apple3' # ] # ], # 'urlTemplate' => 'lf-{Chunk}.jsonz' # } # } Can't use string ("Apple3") as an ARRAY ref while "strict refs" in use at tests/perl_tests/flatfile-to-json.pl.t line 97. # Tests were run but no plan was declared and done_testing() was not seen. tests/perl_tests/flatfile-to-json.pl.t ..... Dubious, test returned 25 (wstat 6400, 0x1900) Failed 1/5 subtests # Failed test 'got right type in parent feature (full record)' # at tests/perl_tests/genbank.t line 75. # got: 'Homo sapiens' # expected: 'mRNA' # [ # 0, # 5001, # 10950, # 1, # [ # 'Eukaryota', # 'Metazoa', # 'Chordata', # 'Craniata', # 'Vertebrata', # 'Euteleostomi', # 'Mammalia', # 'Eutheria', # 'Euarchontoglires', # 'Primates', # 'Haplorrhini', # 'Catarrhini', # 'Hominidae', # 'Homo' # ], # 'Homo sapiens (human)', # 'MIM:138350', # 'glutathione S-transferase mu 1, transcript variant 1', # '9606', # 'Homo sapiens glutathione S-transferase mu 1 (GSTM1), RefSeqGene on chromosome 1.', # 'genomic DNA', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'glutathione S-transferase mu 1, transcript variant 1', # [ # 'NG_009246.1', # 'GI:219521909' # ], # [ # 'RefSeq; RefSeqGene' # ], # 'NG_009246.1', # { # 'genbank_division' => 'PRI', # 'locus_name' => 'NG_009246', # 'modification_date' => '25-JUN-2013', # 'molecule_type' => 'DNA linear', # 'sequence_length' => '12950 bp' # }, # [ # [ # 1, # 5001, # 5114, # 0, # 'alignment:Splign:1.39.8', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'GSTM1', # 'exon', # '1', # 'NG_009246.1' # ], # [ # 2, # 5079, # 5114, # 0, # 'isoform 1 is encoded by transcript variant 1; glutathione S-transferase M1; S-(hydroxyalkyl)glutathione lyase; GST class-mu 1; glutathione S-alkyltransferase; glutathione S-aryltransferase; glutathione S-aralkyltransferase; HB subunit 4; GST HB subunit 4', # 'glutathione S-transferase Mu 1 isoform 1', # 'annotated by transcript or proteomic data', # 'NG_009246.1', # 'MIM:138350', # 'MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEFEKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEGLEKISAYMKSSRFLPRPVFSKMAVWGNK', # 'CDS', # 'NP_000552.2', # 'similar to AA sequence (same species):RefSeq:NP_000552.2', # '1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # '2.5.1.18', # 'GSTM1' # ], # [ # 3, # 5375, # 5450, # 0, # 'MIM:138350', # 'MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEFEKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEGLEKISAYMKSSRFLPRPVFSKMAVWGNK', # 'CDS', # 'NP_000552.2', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # '1', # 'similar to AA sequence (same species):RefSeq:NP_000552.2', # 'GSTM1', # '2.5.1.18', # 'isoform 1 is encoded by transcript variant 1; glutathione S-transferase M1; S-(hydroxyalkyl)glutathione lyase; GST class-mu 1; glutathione S-alkyltransferase; glutathione S-aryltransferase; glutathione S-aralkyltransferase; HB subunit 4; GST HB subunit 4', # 'glutathione S-transferase Mu 1 isoform 1', # 'annotated by transcript or proteomic data', # 'NG_009246.1' # ], # [ # 4, # 5878, # 5942, # 0, # 'NG_009246.1', # 'annotated by transcript or proteomic data', # 'isoform 1 is encoded by transcript variant 1; glutathione S-transferase M1; S-(hydroxyalkyl)glutathione lyase; GST class-mu 1; glutathione S-alkyltransferase; glutathione S-aryltransferase; glutathione S-aralkyltransferase; HB subunit 4; GST HB subunit 4', # 'glutathione S-transferase Mu 1 isoform 1', # '1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'similar to AA sequence (same species):RefSeq:NP_000552.2', # 'GSTM1', # '2.5.1.18', # 'MIM:138350', # 'MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEFEKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEGLEKISAYMKSSRFLPRPVFSKMAVWGNK', # 'NP_000552.2', # 'CDS' # ], # [ # 5, # 6253, # 6334, # 0, # 'isoform 1 is encoded by transcript variant 1; glutathione S-transferase M1; S-(hydroxyalkyl)glutathione lyase; GST class-mu 1; glutathione S-alkyltransferase; glutathione S-aryltransferase; glutathione S-aralkyltransferase; HB subunit 4; GST HB subunit 4', # 'glutathione S-transferase Mu 1 isoform 1', # 'annotated by transcript or proteomic data', # 'NG_009246.1', # 'MIM:138350', # 'MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEFEKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEGLEKISAYMKSSRFLPRPVFSKMAVWGNK', # 'NP_000552.2', # 'CDS', # 'similar to AA sequence (same species):RefSeq:NP_000552.2', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # '1', # '2.5.1.18', # 'GSTM1' # ], # [ # 6, # 6430, # 6530, # 0, # 'GSTM1', # '2.5.1.18', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # '1', # 'similar to AA sequence (same species):RefSeq:NP_000552.2', # 'CDS', # 'NP_000552.2', # 'MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEFEKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEGLEKISAYMKSSRFLPRPVFSKMAVWGNK', # 'MIM:138350', # 'annotated by transcript or proteomic data', # 'NG_009246.1', # 'glutathione S-transferase Mu 1 isoform 1', # 'isoform 1 is encoded by transcript variant 1; glutathione S-transferase M1; S-(hydroxyalkyl)glutathione lyase; GST class-mu 1; glutathione S-alkyltransferase; glutathione S-aryltransferase; glutathione S-aralkyltransferase; HB subunit 4; GST HB subunit 4' # ], # [ # 7, # 7476, # 7571, # 0, # 'isoform 1 is encoded by transcript variant 1; glutathione S-transferase M1; S-(hydroxyalkyl)glutathione lyase; GST class-mu 1; glutathione S-alkyltransferase; glutathione S-aryltransferase; glutathione S-aralkyltransferase; HB subunit 4; GST HB subunit 4', # 'glutathione S-transferase Mu 1 isoform 1', # 'NG_009246.1', # 'annotated by transcript or proteomic data', # 'MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEFEKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEGLEKISAYMKSSRFLPRPVFSKMAVWGNK', # 'MIM:138350', # 'NP_000552.2', # 'CDS', # 'similar to AA sequence (same species):RefSeq:NP_000552.2', # '1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # '2.5.1.18', # 'GSTM1' # ], # [ # 8, # 7659, # 7769, # 0, # 'similar to AA sequence (same species):RefSeq:NP_000552.2', # '1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # '2.5.1.18', # 'GSTM1', # 'MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEFEKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEGLEKISAYMKSSRFLPRPVFSKMAVWGNK', # 'MIM:138350', # 'NP_000552.2', # 'CDS', # 'annotated by transcript or proteomic data', # 'NG_009246.1', # 'isoform 1 is encoded by transcript variant 1; glutathione S-transferase M1; S-(hydroxyalkyl)glutathione lyase; GST class-mu 1; glutathione S-alkyltransferase; glutathione S-aryltransferase; glutathione S-aralkyltransferase; HB subunit 4; GST HB subunit 4', # 'glutathione S-transferase Mu 1 isoform 1' # ], # [ # 9, # 10411, # 10500, # 0, # 'NG_009246.1', # 'annotated by transcript or proteomic data', # 'isoform 1 is encoded by transcript variant 1; glutathione S-transferase M1; S-(hydroxyalkyl)glutathione lyase; GST class-mu 1; glutathione S-alkyltransferase; glutathione S-aryltransferase; glutathione S-aralkyltransferase; HB subunit 4; GST HB subunit 4', # 'glutathione S-transferase Mu 1 isoform 1', # 'similar to AA sequence (same species):RefSeq:NP_000552.2', # '1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # '2.5.1.18', # 'GSTM1', # 'MIM:138350', # 'MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEFEKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEGLEKISAYMKSSRFLPRPVFSKMAVWGNK', # 'NP_000552.2', # 'CDS' # ], # [ # 10, # 5375, # 5450, # 0, # 'GSTM1', # 'alignment:Splign:1.39.8', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'exon', # 'NG_009246.1', # '2' # ], # [ # 11, # 5878, # 5942, # 0, # 'NG_009246.1', # '3', # 'GSTM1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'alignment:Splign:1.39.8', # 'exon' # ], # [ # 12, # 6253, # 6525, # 0, # 'UniSTS:87865', # 'STS', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'GSTM1', # 'NG_009246.1', # 'RH64476' # ], # [ # 13, # 6253, # 6334, # 0, # 'exon', # 'GSTM1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'alignment:Splign:1.39.8', # 'NG_009246.1', # '4' # ], # [ # 12, # 6297, # 6454, # 0, # 'UniSTS:158567', # 'STS', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'GSTM1', # 'NG_009246.1', # 'GDB:655882' # ], # [ # 13, # 6430, # 6530, # 0, # 'exon', # 'GSTM1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'alignment:Splign:1.39.8', # 'NG_009246.1', # '5' # ], # [ # 13, # 7476, # 7571, # 0, # 'exon', # 'GSTM1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'alignment:Splign:1.39.8', # 'NG_009246.1', # '6' # ], # [ # 14, # 7659, # 7769, # 0, # 'exon', # 'GSTM1', # 'alignment:Splign:1.39.8', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'NG_009246.1', # '7' # ], # [ # 15, # 10286, # 11032, # 0, # 'STS', # 'UniSTS:186432', # 'NG_009246.1', # 'G67222' # ], # [ # 16, # 10411, # 10950, # 0, # 'exon', # 'alignment:Splign:1.39.8', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'GSTM1', # '8', # 'NG_009246.1' # ], # [ # 17, # 10594, # 10942, # 0, # 'SHGC-12332', # 'NG_009246.1', # 'STS', # 'UniSTS:33074', # 'GSTM1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1' # ], # [ # 18, # 10594, # 10880, # 0, # 'GSTM1', # 'NG_009246.1', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'STS', # 'UniSTS:33073' # ], # [ # 12, # 10632, # 10780, # 0, # 'UniSTS:139106', # 'STS', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'GSTM1', # 'NG_009246.1', # 'G62022' # ] # ], # 'mRNA', # 'GSTM1', # 'REVIEWED REFSEQ: This record has been curated by NCBI staff. The # reference sequence was derived from AC000031.6 and AC000032.7. # This sequence is a reference standard in the RefSeqGene project. # # Summary: Cytosolic and membrane-bound forms of glutathione # S-transferase are encoded by two distinct supergene families. At # present, eight distinct classes of the soluble cytoplasmic # mammalian glutathione S-transferases have been identified: alpha, # kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a # glutathione S-transferase that belongs to the mu class. The mu # class of enzymes functions in the detoxification of electrophilic # compounds, including carcinogens, therapeutic drugs, environmental # toxins and products of oxidative stress, by conjugation with # glutathione. The genes encoding the mu class of enzymes are # organized in a gene cluster on chromosome 1p13.3 and are known to # be highly polymorphic. These genetic variations can change an # individual\'s susceptibility to carcinogens and toxins as well as # affect the toxicity and efficacy of certain drugs. Null mutations # of this class mu gene have been linked with an increase in a number # of cancers, likely due to an increased susceptibility to # environmental toxins and carcinogens. Multiple protein isoforms are # encoded by transcript variants of this gene. [provided by RefSeq, # Jul 2008].', # 'Homo sapiens', # 'GSTM1', # 'NM_000561.3', # 'NG_009246' # ] # Failed test 'type set correctly in subfeature' # at tests/perl_tests/genbank.t line 103. # got: 'NG_009246.1' # expected: 'exon' # [ # 1, # 5001, # 5114, # 0, # 'alignment:Splign:1.39.8', # 'GST1; GSTM1-1; GSTM1a-1a; GSTM1b-1b; GTH4; GTM1; H-B; MU; MU-1', # 'GSTM1', # 'exon', # '1', # 'NG_009246.1' # ] # Looks like you failed 2 tests of 21. tests/perl_tests/genbank.t ................. Dubious, test returned 2 (wstat 512, 0x200) Failed 2/21 subtests # Failed test 'got right data from volvox test data run' # at tests/perl_tests/generate-names.pl.t line 39. # Structures begin differing at: # $got->{a1e/3.json}{rs116260263}{exact}[0][1] = '12' # $expected->{a1e/3.json}{rs116260263}{exact}[0][1] = '11' # Failed test 'same data after incremental run' # at tests/perl_tests/generate-names.pl.t line 58. # Structures begin differing at: # $got->{e8b/f.json}{rs17878802}{exact}[0][1] = '12' # $expected->{e8b/f.json}{rs17878802}{exact}[0][1] = '11' # Failed test 'same data after incremental run with --safeMode' # at tests/perl_tests/generate-names.pl.t line 74. # Structures begin differing at: # $got->{7bf/e.json}{rs117304270}{exact}[0][1] = '12' # $expected->{7bf/e.json}{rs117304270}{exact}[0][1] = '11' # Looks like you failed 3 tests of 4. tests/perl_tests/generate-names.pl.t ....... Dubious, test returned 3 (wstat 768, 0x300) Failed 3/4 subtests tests/perl_tests/json.t .................... ok # loaded 2559 test features tests/perl_tests/lazy_nclist.t ............. ok tests/perl_tests/maker2jbrowse.t ........... ok tests/perl_tests/nclist.t .................. ok WARNING: multiple reference sequences found named 'NC_001133', using only the first one. WARNING: multiple reference sequences found named 'NC_001133', using only the first one. # /tmp/kClf3A11Ba tests/perl_tests/prepare-refseqs.pl.t ...... ok tests/perl_tests/remove-track.pl.t ......... ok # writing output to /tmp/DgNP6Qb2ry # Failed test 'ucsc_to_json.pl made the right output' # at tests/perl_tests/ucsc-to-json.pl.t line 33. # Structures begin differing at: # $got->{tracks/knownGene/chr1/lf-5.jsonz}[0][5] = 'B2RMP9' # $expected->{tracks/knownGene/chr1/lf-5.jsonz}[0][5] = 'uc001cfh.1' # Track nonExistentTrack not found in the UCSC track database (trackDb.txt.gz) file. Is it a real UCSC track? at bin/ucsc-to-json.pl line 194. # To format the jaxQtlAsIs track, you must have both files tests/data/hg19/database//jaxQtlAsIs.sql and tests/data/hg19/database//jaxQtlAsIs.txt.gz # Looks like you failed 1 test of 6. tests/perl_tests/ucsc-to-json.pl.t ......... Dubious, test returned 1 (wstat 256, 0x100) Failed 1/6 subtests # using temp dir /tmp/3OTz0xjUUl tests/perl_tests/wig-to-json.pl.t .......... ok Test Summary Report ------------------- tests/perl_tests/biodb-to-json.pl.t (Wstat: 512 Tests: 15 Failed: 2) Failed tests: 3, 11 Non-zero exit status: 2 tests/perl_tests/flatfile-to-json.pl.t (Wstat: 6400 Tests: 5 Failed: 1) Failed test: 5 Non-zero exit status: 25 Parse errors: No plan found in TAP output tests/perl_tests/genbank.t (Wstat: 512 Tests: 21 Failed: 2) Failed tests: 11, 18 Non-zero exit status: 2 tests/perl_tests/generate-names.pl.t (Wstat: 768 Tests: 4 Failed: 3) Failed tests: 1-3 Non-zero exit status: 3 tests/perl_tests/ucsc-to-json.pl.t (Wstat: 256 Tests: 6 Failed: 1) Failed test: 2 Non-zero exit status: 1 Files=19, Tests=199, 13 wallclock secs ( 0.07 usr 0.02 sys + 8.75 cusr 1.49 csys = 10.33 CPU) Result: FAIL ```
cmdcolin commented 10 years ago

Thomas on #bioperl provided the following tip

17:20 < trs> cdiesh: Perl version? 17:22 < trs> I don't know what the test code is doing, but if it's relying on the order of keys %hash or values %hash somewhere being stable, that'll fail on Perl >= 5.18.0

cmdcolin commented 10 years ago

Confirmed using perlbrew to install 5.18 on Mac OSX. Tests pass fine on perl 5.16 normally on Mac OSX.

cmdcolin commented 10 years ago

Here's what the nclist in sample_data/json/volvox/tracks/Genes/ctgA/trackData.json looks like on a system with perl5.16

EDIT: to include the whole intervals->nclist subtree

``` "intervals": { "nclist": [ [0, 1049, 9000, 1, "example", "ctgA", "EDEN", "EDEN", "protein kinase", "gene", [ [1, 1049, 9000, 1, "example", "ctgA", "EDEN.1", "EDEN.1", "Eden splice form 1", "EDEN", "mRNA", [ [2, 4999, 5500, 1, "example", "ctgA", "EDEN.1", "0", "CDS"], [2, 1200, 1500, 1, "example", "ctgA", "EDEN.1", "0", "CDS"], [2, 2999, 3902, 1, "example", "ctgA", "EDEN.1", "0", "CDS"], [2, 6999, 7608, 1, "example", "ctgA", "EDEN.1", "0", "CDS"], [3, 7608, 9000, 1, "example", "ctgA", "three_prime_UTR", "EDEN.1"], [3, 1049, 1200, 1, "example", "ctgA", "five_prime_UTR", "EDEN.1"] ]], [1, 1299, 9000, 1, "example", "ctgA", "EDEN.3", "EDEN.3", "Eden splice form 3", "EDEN", "mRNA", [ [3, 1299, 1500, 1, "example", "ctgA", "five_prime_UTR", "EDEN.3"], [2, 3300, 3902, 1, "example", "ctgA", "EDEN.3", "0", "CDS"], [2, 6999, 7600, 1, "example", "ctgA", "EDEN.3", "1", "CDS"], [3, 2999, 3300, 1, "example", "ctgA", "five_prime_UTR", "EDEN.3"], [3, 7600, 9000, 1, "example", "ctgA", "three_prime_UTR", "EDEN.3"], [2, 4999, 5500, 1, "example", "ctgA", "EDEN.3", "1", "CDS"] ]], [1, 1049, 9000, 1, "example", "ctgA", "EDEN.2", "EDEN.2", "Eden splice form 2", "EDEN", "mRNA", [ [3, 1049, 1200, 1, "example", "ctgA", "five_prime_UTR", "EDEN.2"], [2, 6999, 7608, 1, "example", "ctgA", "EDEN.2", "0", "CDS"], [2, 1200, 1500, 1, "example", "ctgA", "EDEN.2", "0", "CDS"], [3, 7608, 9000, 1, "example", "ctgA", "three_prime_UTR", "EDEN.2"], [2, 4999, 5500, 1, "example", "ctgA", "EDEN.2", "0", "CDS"] ]] ]] ], "classes": [{ "isArrayAttr": { "Subfeatures": 1 }, "attributes": ["Start", "End", "Strand", "Source", "Seq_id", "Load_id", "Name", "Note", "Type", "Subfeatures"] }, { "isArrayAttr": { "Subfeatures": 1 }, "attributes": ["Start", "End", "Strand", "Source", "Seq_id", "Load_id", "Name", "Note", "Parent_id", "Type", "Subfeatures"] }, { "isArrayAttr": {}, "attributes": ["Start", "End", "Strand", "Source", "Seq_id", "Parent_id", "Phase", "Type"] }, { "isArrayAttr": {}, "attributes": ["Start", "End", "Strand", "Source", "Seq_id", "Type", "Parent_id"] }, { "isArrayAttr": { "Sublist": 1 }, "attributes": ["Start", "End", "Chunk"] }], "maxEnd": 9000, "count": 1, "lazyClass": 4, "urlTemplate": "lf-{Chunk}.json", "minStart": 1049 } ```

Here's what the same nclist block looks like on a machine with perl 5.18 (note the contents of the feature data are split up)

``` "intervals": { "urlTemplate": "lf-{Chunk}.json", "count": 1, "lazyClass": 17, "maxEnd": 9000, "classes": [{ "isArrayAttr": { "Subfeatures": 1 }, "attributes": ["Start", "End", "Strand", "Note", "Load_id", "Type", "Source", "Subfeatures", "Name", "Seq_id"] }, { "attributes": ["Start", "End", "Strand", "Name", "Seq_id", "Parent_id", "Subfeatures", "Source", "Load_id", "Type", "Note"], "isArrayAttr": { "Subfeatures": 1 } }, { "isArrayAttr": {}, "attributes": ["Start", "End", "Strand", "Seq_id", "Type", "Parent_id", "Source"] }, { "attributes": ["Start", "End", "Strand", "Type", "Seq_id", "Phase", "Parent_id", "Source"], "isArrayAttr": {} }, { "isArrayAttr": {}, "attributes": ["Start", "End", "Strand", "Source", "Parent_id", "Type", "Seq_id"] }, { "isArrayAttr": {}, "attributes": ["Start", "End", "Strand", "Source", "Parent_id", "Type", "Seq_id"] }, { "attributes": ["Start", "End", "Strand", "Seq_id", "Source", "Phase", "Parent_id", "Type"], "isArrayAttr": {} }, { "isArrayAttr": { "Subfeatures": 1 }, "attributes": ["Start", "End", "Strand", "Note", "Load_id", "Type", "Subfeatures", "Source", "Parent_id", "Name", "Seq_id"] }, { "attributes": ["Start", "End", "Strand", "Type", "Seq_id", "Phase", "Parent_id", "Source"], "isArrayAttr": {} }, { "attributes": ["Start", "End", "Strand", "Type", "Seq_id", "Source", "Parent_id"], "isArrayAttr": {} }, { "attributes": ["Start", "End", "Strand", "Source", "Phase", "Parent_id", "Seq_id", "Type"], "isArrayAttr": {} }, { "isArrayAttr": {}, "attributes": ["Start", "End", "Strand", "Parent_id", "Phase", "Source", "Seq_id", "Type"] }, { "attributes": ["Start", "End", "Strand", "Seq_id", "Parent_id", "Phase", "Source", "Type"], "isArrayAttr": {} }, { "attributes": ["Start", "End", "Strand", "Parent_id", "Source", "Subfeatures", "Seq_id", "Name", "Note", "Load_id", "Type"], "isArrayAttr": { "Subfeatures": 1 } }, { "isArrayAttr": {}, "attributes": ["Start", "End", "Strand", "Seq_id", "Source", "Parent_id", "Phase", "Type"] }, { "isArrayAttr": {}, "attributes": ["Start", "End", "Strand", "Source", "Parent_id", "Type", "Seq_id"] }, { "isArrayAttr": {}, "attributes": ["Start", "End", "Strand", "Source", "Parent_id", "Phase", "Seq_id", "Type"] }, { "isArrayAttr": { "Sublist": 1 }, "attributes": ["Start", "End", "Chunk"] }], "minStart": 1049, "nclist": [ [0, 1049, 9000, 1, "protein kinase", "EDEN", "gene", "example", [ [1, 1299, 9000, 1, "EDEN.3", "ctgA", "EDEN", [ [2, 2999, 3300, 1, "ctgA", "five_prime_UTR", "EDEN.3", "example"], [3, 4999, 5500, 1, "CDS", "ctgA", "1", "EDEN.3", "example"], [4, 7600, 9000, 1, "example", "EDEN.3", "three_prime_UTR", "ctgA"], [5, 1299, 1500, 1, "example", "EDEN.3", "five_prime_UTR", "ctgA"], [3, 6999, 7600, 1, "CDS", "ctgA", "1", "EDEN.3", "example"], [6, 3300, 3902, 1, "ctgA", "example", "0", "EDEN.3", "CDS"] ], "example", "EDEN.3", "mRNA", "Eden splice form 3"], [7, 1049, 9000, 1, "Eden splice form 1", "EDEN.1", "mRNA", [ [8, 1200, 1500, 1, "CDS", "ctgA", "0", "EDEN.1", "example"], [9, 7608, 9000, 1, "three_prime_UTR", "ctgA", "example", "EDEN.1"], [10, 2999, 3902, 1, "example", "0", "EDEN.1", "ctgA", "CDS"], [11, 6999, 7608, 1, "EDEN.1", "0", "example", "ctgA", "CDS"], [5, 1049, 1200, 1, "example", "EDEN.1", "five_prime_UTR", "ctgA"], [12, 4999, 5500, 1, "ctgA", "EDEN.1", "0", "example", "CDS"] ], "example", "EDEN", "EDEN.1", "ctgA"], [13, 1049, 9000, 1, "EDEN", "example", [ [14, 1200, 1500, 1, "ctgA", "example", "EDEN.2", "0", "CDS"], [2, 7608, 9000, 1, "ctgA", "three_prime_UTR", "EDEN.2", "example"], [6, 6999, 7608, 1, "ctgA", "example", "0", "EDEN.2", "CDS"], [15, 1049, 1200, 1, "example", "EDEN.2", "five_prime_UTR", "ctgA"], [16, 4999, 5500, 1, "example", "EDEN.2", "0", "ctgA", "CDS"] ], "ctgA", "EDEN.2", "Eden splice form 2", "EDEN.2", "mRNA"] ], "EDEN", "ctgA"] ] } ```
cmdcolin commented 10 years ago

To be clear about the source of the problem, for example:

In the perl 5.18 code, the class structure for the class order "0" is:

[1:"Start", 2:"End", 3:"Strand", 4:"Note", 5:"Load_id", 6:"Type", 7:"Source", 8:"Subfeatures", 9:"Name", 10:"Seq_id"]

and then data structure in nclist matches this:

[1:1049, 2:9000, 3:1, 4:"protein kinase", 5:"EDEN", 6:"gene", 7:"example", 8:subfeatures, 9:"EDEN", 10: "ctgA"]

In perl 5.16 the structure of class order 0 is:

["Start", "End", "Strand", "Source", "Seq_id", "Load_id", "Name", "Note", "Type", "Subfeatures"]

and then the data structure in nclist matches this:

    [   1049,9000, 1,   'example',   'ctgA',  'EDEN',     'EDEN',    'protein kinase',  'gene']

Then, the test code assumes that the structure of class order 0 matches some pre-defined method, when in fact it appears this assumption is invalid. The test code will be updated

cmdcolin commented 10 years ago

Here is a full output using perl 5.18 on ubuntu (large file 127kb). It fails flatfile-to-json and generate-names tests http://pastebin.com/ZDt5nBm0

Note: Example of problem in flatfile-to-json where many NCList ArrayRepr classes are dynamically created just slightly shuffled around

``` { 'attributes' => [ 'Start', 'End', 'Strand', 'Type', 'Id', 'Name', 'Source', 'Subfeatures', 'Score', 'Seq_id' ], 'isArrayAttr' => { 'Subfeatures' => 1 } }, { 'attributes' => [ 'Start', 'End', 'Strand', 'Seq_id', 'Score', 'Type', 'Id', 'Name', 'Source', 'Subfeatures' ], 'isArrayAttr' => { 'Subfeatures' => 1 } }, ``` For the generate-names.pl, a data item in the 'exact' match part of the names structures is not matching. I don't exactly know what this data item represents even after looking at source code ``` Failed test 'got right data from volvox test data run' at tests/perl_tests/generate-names.pl.t line 39. Structures begin differing at: $got->{e8b/f.json}{rs17878802}{exact}[0][1] = '12' $expected->{e8b/f.json}{rs17878802}{exact}[0][1] = '11' Failed test 'same data after incremental run' at tests/perl_tests/generate-names.pl.t line 58. Structures begin differing at: $got->{f4d/1.json}{rs4998557}{exact}[0][1] = '12' $expected->{f4d/1.json}{rs4998557}{exact}[0][1] = '11' Failed test 'same data after incremental run with --safeMode' at tests/perl_tests/generate-names.pl.t line 74. Structures begin differing at: $got->{262/f.json}{rs80265967}{exact}[0][1] = '12' $expected->{262/f.json}{rs80265967}{exact}[0][1] = '11' Looks like you failed 3 tests of 4. ```
cmdcolin commented 7 years ago

There is a pretty unfortunate consequence of this issue which is that using perl 5.18 and over with flatfile-to-json will take much longer and causes much bigger file sizes

This was sort of alluded to in previous comments here already, basically the fact that the hash order is randomized means that a bunch of combinatorial possibilities of feature types are generated (e.g. some features are represented by start,end,name,id,parent in trackData.json, some are represented by name,end,start,parent,id just with data values switched around etc.)

The data works at runtime but this inflates the size of the files and takes longer to run.

Here's a short example parsing a 280MB gff

Perl 5.14, takes about 5 minutes

time bin/flatfile-to-json.pl --gff file.gff --sortMem 1000000000 --trackLabel test_5_14
205.80s user 10.97s system 73% **cpu 4:54.56 total**

Perl 5.18, takes almost 4 hours

time bin/flatfile-to-json.pl --gff file.gff --sortMem 1000000000 --trackLabel test_5_18
13749.42s user 240.97s system 99% **cpu 3:54:43.13 total**

Not only this but the disk size is vastly huger

In the perl 5.14 data directory, the disk size is 366MB for this track. In the 5.18 instance, the disk space is 21 GB (gigabytes)

Therefore there is a 66x increase in running time and a 57x increase in disk space consumption!

Due to this, it might be advisable to (a) put a big warning saying to use versions earlier than 5.18 because 5.18 was when perl made the hash order randomization and/or (b) fix this bug

This seems weird to report about only now but I think this is reproducible and sucks for the end user. Also perl 5.18 and over probably only recently became the default perl distribution on newer operating systems so more users will experience this

cmdcolin commented 7 years ago

Possible solution: everywhere where it says "keys %hash" replace it with "sort keys %hash".

cmdcolin commented 6 years ago

Here is a test GFF I think I recall demonstrated the very long run time and disk space blowup (not all gffs seem to do this) ftp://ftp.ncbi.nlm.nih.gov/genomes/Scleropages_formosus/GFF/ref_ASM162426v1_top_level.gff3.gz

rbuels commented 6 years ago

Looks like the changes you made in #912 fix that performance regression. Nicely done.

rbuels commented 6 years ago

Fixed! Merged the PR. Thanks so much @cmdcolin