Yasas1994 / Jaeger

Jaeger is a quick and precise tool for detecting phages in sequence assemblies.
MIT License
24 stars 1 forks source link

Model can't take sequences of certain length as input #8

Closed Mashin6 closed 1 year ago

Mashin6 commented 1 year ago

A weird bug I just encountered that is specific to only to some length of input sequences

import jaegeraa.lib
model=jaegeraa.lib.Predictor()
input = 'GTAACCGTCGAGAGAGATTTGGCTATGCCAATGAGCCCTGAGGATCCCGCTAGTGGTTTACTGTTCACTATTGGGGATATTCATGAGAATGGCAGGAATGTGAGCGTGGTTGAGAGTAGGTTGCTCAATGGTCGAGTGCCTTTTAGAGCTGGAGACTTACGAAACATGAGCTACAATTACTTTATGGAGTTCGTGAGGATCTACGCAACTATCTATATGGAGAATCAACAGCAACTCGTGGCTAAGCTTTCAGGAGATGATTACGAAAGCTCTTCATCATCGTTTCCCGAGAATGAGGAATTGGAATTTGACTTCCTAGCCCAAGCACACAATGGTGTGTACCTAACGATAGAGGAAGTTGTAGCTAAATTTGAGTCAATGAAATTCTCGGGAAAACAACTCAATGCTGAAATTGAAAAATTCGAAAGAATTGGAGTTGATGGATGGAGAACTAACAAAGCTCTCTCCTTTAATGATTTGGTCAAAAGGTTTTGTGGATGCTGCTTAGGTGATGACTGTAACTTTGATTTCCACTATCGAACTTTATTCAAAGTGCTAATAGAGAATAAGCAAATCCCAGCCTACAAGTGTATGGTTCTCCATAAAGTGAATCCAGATAGAATGAAGACTCAGATAAAGATGGTGAACGGGTACACTTTGGAAACAATGTTTAAGACTTTGAACCCTCTCACCATTTTCTTATATCTGGTTTTTGTGCTGAAATGTGGTATTAGTGCCGACAATGTATGTTTATCGTACCAATTATTTGCTATGAATGACGCAGAGCAAGTTGAATTTGAAATTGAAGATTCTTTGCGTCTGGATGAACAGGTACAAATTGGTCAATACTCATGCTATGTTTGGCCTAGTGTCGGAAAATTCTATCCGGAAATTCTGGCGAAGAGAGGTTGCATTGCTGTGAATGATGGAACTACATTTTATATTTTCGTTTCAAGTTCACAGATAGATAAAATTCACCCAGAAGCAGCGTGGTCGGATATGCTACAAGGAGTAGGCAGAAGAGGAGTCGATATTTTAAGTATAGCTGGTCCAACAAAAACCAAGTTTCTGATAAAACATGTGGAAAGTTGTTACGAAACTCTTAAGAGTCCGGAAGATTGGAAAGCTAAATGCAAAGAGTACTATGAGTCCATAAGCTTATATGAGTACATTCTCTTACTGATGGCAGTTGGGTCTCGAGCTGGAATTGAAACCCAGAGGATGAGTAAATATCAGGCCCGAAAGAACAAAATTAGAATGCCAGAAGTGTTGGAGAAGTACATTGAAGTTGAGAAAGCGACCATAGGAAAGCTGTCAAAACCAGCCAAGACCTGTCTAGCAATTGGTGCCGGAGTGGCTATTTTTGGAGTTCTAGCGGGGCTAGGAGTCGGTCTATATAAATTGATAACTCATTTTTCTAAGACCGACTCAGAAGACAATGACATTGAAATAGATGATCTAGTCCCGGAGATGAGTGGAGCTCATGCTTCTGATGAGAATGTTACCACATATGCTGTCAGGAGACAAGTTCCAAAGGTGCGACTAGCCAAACAATTCAAAGTTCGCTCGTCACCAAGCCCATCAGACAATGAACAACCAAAAGTAGATATTCTAGTGCCTGAAATGACAGGGTGCCATGCCAGTGATGAACACCTCACCAAGCATTTTACAAAAAGGAGAGTCACCATGAAGAGAGTTGGAGCTGTCAAGGAATCACACATTGTGACATATGACGAGAATACTCCACATGTGAGACTCATCAGAAATCTGAGAAGAACACGCTTGGCGAGAGCTATTAAGCAAATGGCACAACTTGGAGAACTACCGGACACATTGTCAGAAATTCAAGTGTGGCAACAATATGTAGTGGACAAAGGTATCAGACCAGCTGAACATACAACAGATTTTAGACTCTTCTCAGCTATAGCTGATCAGGAACAAGAGGATCCAGAAGAAATCAATATGGCGAGTGGAGAAACGATGAAATTTGACGAAAACAAGTACAATGAGATAGTCCAAGTCGTCAAAGGGATATCGCCAACTAAATCTGACATAGTGACAATGACTACTAAAGGAGCCCACCATACGGCGATCAAGCAGGTTCGAATTGGATACAAAAGTTTAGACAAGGATCCGAATATGGTGAGCATACTTTCTAACCAACTAACCAAAATTAGTTGTGTAATTTTGAACGTGACTCCTGGTAGAACGGCGTACCTAAACGTCATGAGGTTGTGTGGGACATTTGTTGTGTGCCCAGCCCATTATCTAGAAGCTCTAGAAGAGGATGACACGATTTACTTCATATCCTTTTCTGTCTGTATTAAACTCAGATTTCAACCAGACAGAGTGACATTAGTCAACACTCATCAAGATCTTGTAGTGTGGGATTTGGGTAATTCAGTACCACCGGCTATTGACGTTTTGAGCATGATACCAACCGTGGCAGATTGGGACAAGTTTCAAGATGGCCCTGGTGCTTTTGGTGTGACAAAGTACAATGCTCGGTATCCAACAAATTACATAAATACTCTTGATATGATTGAGAGAATCCGAGCCGACACTCAGAACCCCACGGGCATATACAAAATGCTCAACTCCGATCACACAATCACCACAGGTCTTAGATATCAGATGTACTCATTAGAAGGATTCTGTGGTGGGCTGATACTACGGGCTTGCACTAGAATGGTTAGAAAGATTGTGGGACTTCATGTAGCTGCTAGTGCAAATCACGCTATGGGATATGCAGAATGTCTGGTGCAAGAAGATCTTAAACATGCTATAAATAAGCTGTCACCAGATGCAAGGAGTTTAATTATCGGACATCTCAATCCCAAAGTAGAAACAGCCACAAAACAGTGTGGAATTGTGAGGAGCCTTGGAAGTCTAGGGTGCCACGGAAAGGTTACAAGTGAGGACGTGGCGATGACTGCAACAAAGACCACGATCAGAAAGTCTAGAATTTATGGTCTTGTTGGAGATATCAAAACAGAACCCTCAATTTTACATGCTCATGACCCACGTCTCCCTGAGGATCAGATTGGAAAGTGGGACCCAGTGTTTGAAGCTGCCTTGAAGTATGGAACAAGAATAGAACCATTCCCCATTGAAGAAATTCTTGAAGTGGAAGATCATTTATCTATTATACTTAAAGGCATGGACAATACTCTCAAGAAAAGAAATGTCAACAATCTTGAAGTTGGGATAAACGGAATAGATCAATCAGATTATTGGCTTCAGATAGAGACAAATACTTCTCCTGGGTGGCCCTACACAAAAAGAAAACCGAAGGGAGCTGAAGGAAAGAAATGGTTGTTCAAAGAGGTTGGGAACTACCCCTCCGGGAAACCCATTCTAGAAATGGAGGACTCAGGACTCATTGAGAGCTACAATAAAATGTTGAGAGATGCCAAACAGGGTGTAGCTCCCATTGTGGTTACTGTGGAGTGCCCAAAAGATGAACGCAGAAAGTTAAGTAAGATCTACGAACAACCAGCCACCAGGACTTTCACGATTCTCCCGCCTGAAATAAACATTCTCTTTAGGCAATATTTTGGTGACTTTGCCGCCATGATAATGACTAATAGATCAAAATTATTCTGTCAGGTTGGGATAAATCCAGAGAATATGGAATGGAGTGATCTAATGCATGAGTTCCTCCACAAGTCAACACATGGCTTTGCTGGAGACTACTCAAAATTTGATGGAATTGGAGATCCTCAGATTTATCATTCCATAACTCAGGTGGTAAATAACTGGTACGATGATGGGGAAGAAAATGCCAGGACACGTCACGCACTAATTAGTAGTATAATACATAGAGAGGGTATAGTTAAGGAGTATCTTTTCCAGTATTGTCAGGGAATGCCTTCTGGTTTTGCCATGACAGTCATTTTCAACTCCTTCGTGAATTATTACTATTTAGCTATGGCGTGGATGAATTTAATCTCACACTCACCATTGAGTCCCCAATCCACGGTTAGAGATTTCGACAACTATTGTAAGGTAGTAGTTTATGGGGACGATAACATAGTTTCAGTAGATTTGAACTTTCTAGAATATTACAACCTTAGGACTGTAGCAGCTTATTTGTCTCAATTTGGAGTAACGTACACAGATGACGCAAAGAATCCGATTGAGAAAAGTGTGCCTTTCGTAGAAATAACTTCTGTTTCATTTCTTAAGCGTAGGTGGGTGCCCTTGGGTGGAAGACTTTCAACTATTTACAAGGCACCTTTGGACAAAACTAGCATAGAGGAGCGCCTTCATTGGATAAGGGAGTGCGATAATGACATCGAAGCTCTCAATCAGAATATTGAAAGCGCCCTATATGAAGCAAGCATTCATGGAAAGATCTACTTTGGTGATCTCCTTCAGAGGATCCGGATTGCTTGTGACGCTGTGATGATCCCAGTTCCATCAGTAACATTTAAGGATTGTCACAAAAGGTGGTGGGCTTCCATGACTGGAGGAGCTTTAGATCCAGCTAGTCTAAGTCGGTTGTACTTGGCCGCCGAGAACCAGTTGGTCGACACTCGGAAAGTGTGGAAAGATCGCTTCCTTGGTGAGGATAGGTCTTTAATAGACATGCTGAAGTCAGCTCGTGCTGTTCCTCTAGCTGCCTATCATGTATAAGCCTCACGACTCTGTGCAGAGTATAACAGCACGACCCCAGGTTATCGATAAGTCATGTTGGTAGTCGTCAAGTAAGAATGGGACAGAAAAGAGATTGGAACTTTTAGGATGGAACATCAGTAAACCTACGGGAAACAGAGCTATGGAACTCCCAAGTACTGTAGGTCCCTATTGGTAGTTCACTAAAAGTAACCTTCTGTGTATGATCCCTACCCTGAGTGAACGACAGAAATATGATACACGAGTACTCTCATTAGAGAGAACCGGATTCCACATTGTGGAATCTCCCAGGAATTGACCTGGGTTCCTCACGAAAGTGAGGCGACAACTTGGTCGAAAAACAAGTTCAGTTTAGTTGAGAC'
predictions=model.predict(input,stride=10,fragsize=3000,batch=100)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/projects/macma/220530_kraken2-simreads/scripts/230629/Jaeger/jaegeraa/lib.py", line 47, in predict
    for c,(a,s,d,f,l) in enumerate(extract_pred_entry(self.model,idataset)):
  File "/projects/macma/220530_kraken2-simreads/scripts/230629/Jaeger/jaegeraa/postprocessing.py", line 30, in extract_pred_entry
    for prob,y_pred,id_,pos_,is_last_,index_,clen_ in get_predictions(idataset,model):
  File "/projects/macma/220530_kraken2-simreads/scripts/230629/Jaeger/jaegeraa/postprocessing.py", line 9, in get_predictions
    logits = model(batch[0]).numpy()
  File "/home/AD/macma/miniconda3/envs/jaeger/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/AD/macma/miniconda3/envs/jaeger/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 7209, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "add_5" "f"(type Add).

{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [21,250,128] vs. [21,249,128] [Op:AddV2]

Call arguments received by layer "add_5" "                 f"(type Add):
  • inputs=['tf.Tensor(shape=(21, 250, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)', 'tf.Tensor(shape=(21, 250, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)']

Some of the lengths that don't work:

108,109,120,121,132,133,144,145,156,157,168,169,180,181,192,193,204,205,216,217,228,229,240,241,252,253,264,265,276,277,288,289,300,301,312,313,324,325,336,337,348,349,360,361,372,373,384,385,396,397,408,409,420,421,432,433,444,445,456,457,468,469,480,481,492,493,504,505,516,517,528,529,540,541,552,553,564,565,576,577,588,589,600,601,612,613,624,625,636,637,648,649,660,661,672,673,684,685,696,697,708,709,720,721,732,733,744,745,756,757,768,769,780,781,792,793,804,805,816,817,828,829,840,841,852,853,864,865,876,877,888,889,900,901,912,913,924,925,936,937,948,949,960,961,972,973,984,985,996,997,1008,1009,1020,1021,1032,1033,1044,1045,1056,1057,1068,1069,1080,1081,1092,1093,1104,1105,1116,1117,1128,1129,1140,1141,1152,1153,1164,1165,1176,1177,1188,1189,1200,1201,1212,1213,1224,1225,1236,1237,1248,1249,1260,1261,1272,1273,1284,1285,1296,1297,1308,1309,1320,1321,1332,1333,1344,1345,1356,1357,1368,1369,1380,1381,1392,1393,1404,1405,1416,1417,1428,1429,1440,1441,1452,1453,1464,1465,1476,1477,1488,1489,1500,1501,1512,1513,1524,1525,1536,1537,1548,1549,1560,1561,1572,1573,1584,1585,1596,1597,1608,1609,1620,1621,1632,1633,1644,1645,1656,1657,1668,1669,1680,1681,1692,1693,1704,1705,1716,1717,1728,1729,1740,1741,1752,1753,1764,1765,1776,1777,1788,1789,1800,1801,1812,1813,1824,1825,1836,1837,1848,1849,1860,1861,1872,1873,1884,1885,1896,1897,1908,1909,1920,1921,1932,1933,1944,1945,1956,1957,1968,1969,1980,1981,1992,1993,2004,2005,2016,2017,2028,2029,2040,2041,2052,2053,2064,2065,2076,2077,2088,2089,2100,2101,2112,2113,2124,2125,2136,2137,2148,2149,2160,2161,2172,2173,2184,2185,2196,2197,2208,2209,2220,2221,2232,2233,2244,2245,2256,2257,2268,2269,2280,2281,2292,2293,2304,2305,2316,2317,2328,2329,2340,2341,2352,2353,2364,2365,2376,2377,2388,2389,2400,2401,2412,2413,2424,2425,2436,2437,2448,2449,2460,2461,2472,2473,2484,2485,2496,2497,2508,2509,2520,2521,2532,2533,2544,2545,2556,2557,2568,2569,2580,2581,2592,2593,2604,2605,2616,2617,2628,2629,2640,2641,2652,2653,2664,2665,2676,2677,2688,2689,2700,2701,2712,2713,2724,2725,2736,2737,2748,2749,2760,2761,2772,2773,2784,2785,2796,2797,2808,2809,2820,2821,2832,2833,2844,2845,2856,2857,2868,2869,2880,2881,2892,2893,2904,2905,2916,2917,2928,2929,2940,2941,2952,2953,2964,2965,2976,2977,2988,2989,3000,3001
Yasas1994 commented 1 year ago

hi Martin,

Thanks for reporting this issue. I will push a fix as soon as possible. However, I do not recommend using Jaeger sequences < 1024 bps.

Yasas1994 commented 1 year ago

Hi Martin,

I hope your problem is resolved now

Mashin6 commented 1 year ago

Thanks Yasas. All works good now.