Closed blahah closed 9 years ago
I should know better than ever to have a constant-sized buffer for anything.
I’ll fix it soon.
--Bill
From: Richard Smith-Unna [mailto:notifications@github.com] Sent: Friday, October 16, 2015 6:50 AM To: amplab/snap snap@noreply.github.com Subject: [snap] Very long FASTA defline parsed incorrectly, crashes (#60)
FASTA file contained a character that's not a valid base (or N): '1', full line '10683, 10693, 10428, 10327, 10211, 10426, 10325, 10514, 10425, 10324, 10599, 10569, 10670, 10512, 10422, 10320, 10203, 9963, 9964, -2]';
converting to 'N'. This may happen again, but there will be no more warnings.
Saving genome...1s
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
This comes from a very long FASTA header in the assembly, but there's no newline in either of the two headers that contain this string. The two headers are included below:
TR24039_c0_g1_i1 len=6045 path=[10023:0-283 10024:284-289 10025:290-307 10620:308-313 10598:314-334 10568:335-367 @10420@!:368-463 10510:464-514 10038:515-520 10419:521-737 9752:738-775 10315:776-798 10509:799-885 10418:886-889 10313:890-913 9908:914-919 10416:920-948 10508:949-1018 10312:1019-1042 10415:1043-1073 9503:1074-1097 10597:1098-1295 10659:1296-1303 10566:1304-1319 10619:1320-1327 10596:1328-1336 10564:1337-1392 10306:1393-1394 10631:1395-1397 10618:1398-1416 10595:1417-1442 10660:1443-1466 10674:1467-1480 10303:1481-1527 9663:1528-1528 9664:1529-1562 10406:1563-1696 9710:1697-1777 10562:1778-1858 9394:1859-1883 10405:1884-1987 10499:1988-2032 10594:2033-2039 10561:2040-2063 10498:2064-2064 10403:2065-2088 10298:2089-2173 9510:2174-2197 10560:2198-2306 10297:2307-2389 10593:2390-2434 10661:2435-2458 10675:2459-2471 10295:2472-2500 10494:2501-2518 10399:2519-2542 10293:2543-2593 10493:2594-2617 10398:2618-2647 10292:2648-2671 10492:2672-2704 9422:2705
-2728 10
397:2729-3027 10662:3028-3051 10676:3052-3061 10687:3062-3085 10697:3086-3113 10490:3114-3139 10395:3140-3150 10558:3151-3175 10488:3176-3202 10393:3203-3206 10286:3207-3213 10486:3214-3237 10391:3238-3240 10663:3241-3247 10677:3248-3278 9892:3279-3318 10554:3319-3371 10481:3372-3395 10386:3396-3419 10280:3420-3464 10552:3465-3488 10480:3489-3490 10385:3491-3514 10613:3515-3545 10587:3546-3569 10551:3570-3661 10664:3662-3668 10276:3669-3685 10645:3686-3686 10636:3687-3692 10626:3693-3693 10612:3694-3710 10586:3711-3718 10549:3719-3722 10472:3723-3746 10635:3747-3772 10611:3773-3796 10625:3797-3799 10546:3800-3823 10610:3824-3856 10469:3857-3869 10261:3870-3901 10371:3902-3937 10583:3938-3961 10467:3962-4009 10544:4010-4033 10665:4034-4045 10679:4046-4069 10689:4070-4070 10256:4071-4105 10643:4106-4139 10624:4140-4161 10608:4162-4180 10581:4181-4189 10360:4190-4213 10535:4214-4243 10248:4244-4267 10454:4268-4293 10534:4294-4315 10358:4316-4339 10623:4340-4378 10607:4379-4402
10580:44
03-4498 10452:4499-4522 10356:4523-4525 10244:4526-4543 10532:4544-4546 10449:4547-4582 10353:4583-4585 10530:4586-4621 10718:4622-4639 10713:4640-4663 10707:4664-4682 10237:4683-4726 9991:4727-4727 9992:4728-4751 10444:4752-4759 10347:4760-4766 10234:4767-4801 9644:4802-4823 9645:4824-4847 9646:4848-4849 10440:4850-4918 10730:4919-4927 10343:4928-4966 10712:4967-4990 10228:4991-5044 10632:5045-5078 10603:5079-5110 10574:5111-5138 10520:5139-5167 10602:5168-5170 10573:5171-5218 10335:5219-5242 10518:5243-5274 10433:5275-5298 10601:5299-5335 10571:5336-5359 10517:5360-5398 10331:5399-5443 10600:5444-5467 10570:5468-5539 10669:5540-5542 10515:5543-5563 10683:5564-5566 10693:5567-5632 10428:5633-5645 10327:5646-5669 10211:5670-5674 10426:5675-5692 10325:5693-5716 10514:5717-5717 10425:5718-5741 10324:5742-5761 10599:5762-5800 10569:5801-5824 10670:5825-5827 10512:5828-5872 10422:5873-5908 10320:5909-5959 10203:5960-5999 9963:6000-6023 9964:6024-6044] [-1, 10023, 10024, 10025, 1
0620, 10
598, 10568, 10420, 10510, 10038, 10419, 9752, 10315, 10509, 10418, 10313, 9908, 10416, 10508, 10312, 10415, 9503, 10597, 10659, 10566, 10619, 10596, 10564, 10306, 10631, 10618, 10595, 10660, 10674, 10303, 9663, 9664, 10406, 9710, 10562, 9394, 10405, 10499, 10594, 10561, 10498, 10403, 10298, 9510, 10560, 10297, 10593, 10661, 10675, 10295, 10494, 10399, 10293, 10493, 10398, 10292, 10492, 9422, 10397, 10662, 10676, 10687, 10697, 10490, 10395, 10558, 10488, 10393, 10286, 10486, 10391, 10663, 10677, 9892, 10554, 10481, 10386, 10280, 10552, 10480, 10385, 10613, 10587, 10551, 10664, 10276, 10645, 10636, 10626, 10612, 10586, 10549, 10472, 10635, 10611, 10625, 10546, 10610, 10469, 10261, 10371, 10583, 10467, 10544, 10665, 10679, 10689, 10256, 10643, 10624, 10608, 10581, 10360, 10535, 10248, 10454, 10534, 10358, 10623, 10607, 10580, 10452, 10356, 10244, 10532, 10449, 10353, 10530, 10718, 10713, 10707, 10237, 9991, 9992, 10444, 10347, 10234, 9644, 9645, 9646, 10440, 10730, 10343, 10712
, 10228,
10632, 10603, 10574, 10520, 10602, 10573, 10335, 10518, 10433, 10601, 10571, 10517, 10331, 10600, 10570, 10669, 10515, 10683, 10693, 10428, 10327, 10211, 10426, 10325, 10514, 10425, 10324, 10599, 10569, 10670, 10512, 10422, 10320, 10203, 9963, 9964, -2]
TR24039_c0_g1_i2 len=6013 path=[9298:0-283 10421:284-307 10620:308-313 10598:314-334 10568:335-367 @10420@!:368-463 10510:464-514 10038:515-520 10419:521-737 9752:738-775 10315:776-798 10509:799-885 10418:886-889 10313:890-913 9908:914-919 10416:920-948 10508:949-1018 10312:1019-1042 10415:1043-1073 9503:1074-1097 10597:1098-1295 10659:1296-1303 10413:1304-1327 10565:1328-1336 10505:1337-1397 10618:1398-1416 10595:1417-1442 10660:1443-1466 10674:1467-1480 10303:1481-1527 9663:1528-1528 9664:1529-1562 10406:1563-1696 9710:1697-1777 10562:1778-1858 9394:1859-1883 10405:1884-1987 10499:1988-2032 10594:2033-2039 10561:2040-2063 10498:2064-2064 10403:2065-2088 10298:2089-2173 9510:2174-2197 10560:2198-2306 10297:2307-2389 10593:2390-2434 10661:2435-2458 10675:2459-2471 10295:2472-2500 10494:2501-2518 10399:2519-2542 10293:2543-2593 10493:2594-2617 10398:2618-2647 10292:2648-2671 10492:2672-2704 9422:2705-2728 10397:2729-3027 10662:3028-3051 10676:3052-3061 10687:3062-3085 1069
7:3086-3
113 10489:3114-3139 10394:3140-3150 10287:3151-3175 10487:3176-3202 10592:3203-3240 10557:3241-3247 10485:3248-3278 10390:3279-3318 10553:3319-3371 10481:3372-3395 10386:3396-3419 10280:3420-3464 10552:3465-3488 10480:3489-3490 10385:3491-3514 10613:3515-3545 10587:3546-3569 10551:3570-3661 9796:3662-3685 10645:3686-3686 10636:3687-3692 10626:3693-3693 10612:3694-3710 10586:3711-3718 10549:3719-3722 10472:3723-3746 10635:3747-3772 10611:3773-3796 10625:3797-3799 10546:3800-3823 10610:3824-3856 10469:3857-3869 10261:3870-3901 10371:3902-3937 10583:3938-3961 10467:3962-4009 10544:4010-4033 10665:4034-4045 10679:4046-4069 10689:4070-4070 10256:4071-4105 10643:4106-4139 10624:4140-4161 10608:4162-4180 10581:4181-4189 10360:4190-4213 10535:4214-4243 10248:4244-4267 10454:4268-4293 10534:4294-4315 10358:4316-4339 10623:4340-4378 10607:4379-4402 10580:4403-4498 10452:4499-4522 10356:4523-4525 10244:4526-4543 10532:4544-4546 10449:4547-4582 10353:4583-4585 10530:4586-4621 10718:4622
-4639 10
713:4640-4663 10707:4664-4682 10237:4683-4726 9991:4727-4727 9992:4728-4751 10444:4752-4759 10347:4760-4766 10234:4767-4801 9644:4802-4823 9645:4824-4847 9646:4848-4849 10440:4850-4918 10730:4919-4927 10343:4928-4966 10712:4967-4990 10228:4991-5044 10632:5045-5078 10603:5079-5110 10574:5111-5138 10520:5139-5167 10602:5168-5170 10573:5171-5218 10335:5219-5242 10518:5243-5274 10433:5275-5298 10601:5299-5335 10571:5336-5359 10517:5360-5398 10331:5399-5443 10600:5444-5467 10570:5468-5539 10669:5540-5542 10515:5543-5563 10683:5564-5566 10693:5567-5632 10428:5633-5645 10327:5646-5669 10211:5670-5674 10426:5675-5692 10325:5693-5716 10514:5717-5717 10425:5718-5741 10324:5742-5761 10599:5762-5800 10569:5801-5824 10670:5825-5827 10321:5828-5872 10204:5873-5908 9522:5909-5972 9523:5973-6012] [-1, 9298, 10421, 10620, 10598, 10568, 10420, 10510, 10038, 10419, 9752, 10315, 10509, 10418, 10313, 9908, 10416, 10508, 10312, 10415, 9503, 10597, 10659, 10413, 10565, 10505, 10618, 10595, 10660,
10674, 1
0303, 9663, 9664, 10406, 9710, 10562, 9394, 10405, 10499, 10594, 10561, 10498, 10403, 10298, 9510, 10560, 10297, 10593, 10661, 10675, 10295, 10494, 10399, 10293, 10493, 10398, 10292, 10492, 9422, 10397, 10662, 10676, 10687, 10697, 10489, 10394, 10287, 10487, 10592, 10557, 10485, 10390, 10553, 10481, 10386, 10280, 10552, 10480, 10385, 10613, 10587, 10551, 9796, 10645, 10636, 10626, 10612, 10586, 10549, 10472, 10635, 10611, 10625, 10546, 10610, 10469, 10261, 10371, 10583, 10467, 10544, 10665, 10679, 10689, 10256, 10643, 10624, 10608, 10581, 10360, 10535, 10248, 10454, 10534, 10358, 10623, 10607, 10580, 10452, 10356, 10244, 10532, 10449, 10353, 10530, 10718, 10713, 10707, 10237, 9991, 9992, 10444, 10347, 10234, 9644, 9645, 9646, 10440, 10730, 10343, 10712, 10228, 10632, 10603, 10574, 10520, 10602, 10573, 10335, 10518, 10433, 10601, 10571, 10517, 10331, 10600, 10570, 10669, 10515, 10683, 10693, 10428, 10327, 10211, 10426, 10325, 10514, 10425, 10324, 10599, 10569, 10670, 10321, 1
0204, 95
22, 9523, -2]
— Reply to this email directly or view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2famplab%2fsnap%2fissues%2f60&data=01%7c01%7cbolosky%40microsoft.com%7c0c55e408a2954b3a590908d2d630aaa0%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=9ifQLLbIy3zLCrjCLW%2fK3EH0jBxJKorO2gbtASvSYrU%3d.
I put in a 90% fix for this in beta.21 and dev.91. It seems to work OK, except for one case that involves reading SAM lines that have very large contig names (I used 20K bytes), which causes an error that correctly reports a bizarrely long SAM line. BAM input seems to work fine for this. I'm not inclined to fix the problem with SAM because I doubt many people will actually want to generate files that look like that, where each aligned read has 20K of contig name in it.
Great, thanks! However, I would caution that in the world of bioinformatics, assuming people will make any kind of sane development decision in upstream tools is dangerous. Here there be drunken, concussed dragons.
This comes from a very long FASTA header in the assembly, but there's no newline in either of the two headers that contain this string. The two headers are included below: