COMBINE-lab / pufferfish

An efficient index for the colored, compacted, de Bruijn graph
GNU General Public License v3.0
107 stars 19 forks source link

Cedar segfaults with SAM files #44

Open hermidalc opened 1 year ago

hermidalc commented 1 year ago

I cannot seem to get Cedar to properly work without segfaulting. My input SAM is produced by PuffAligner against a CAMI example paired-end FASTQ and a Pufferfish index of 100 genomes. I'm using the Cedar --flat option. No matter what I try it always segfaults. @fataltes do you have any idea?

$ /home/hermidalc/projects/github/hermidalc/pufferfish/bin/pufferfish/c4db524/linux-x64/cedar --flat --sam results/pufferfish/cami/cami1/CAMI_low/CAMI_low_RL_S001__insert_270_pufferfish.sam.gz --output quant.sf
[2022-10-24 05:17:37.424] [console] [info] Cedar: Construct ..
[2022-10-24 05:17:37.425] [console] [info] Cedar: Load Mapping File ..
[2022-10-24 05:17:37.425] [console] [info] Mapping Output File: results/pufferfish/cami/cami1/CAMI_low/CAMI_low_RL_S001__insert_270_pufferfish.sam.gz
[2022-10-24 05:17:37.433] [console] [info] # of targets: 6260
0 0 | RL|S1|R0 | 150 | 1 | -1 | -1 | 0
1 0 | RL|S1|R0 | 150 | 1 | -1 | -1 | 0
2 0 | RL|S1|R1 | 150 | 1 | -1 | -1 | 0
3 0 | RL|S1|R1 | 150 | 1 | -1 | -1 | 0
4 0 | RL|S1|R2 | 150 | 1 | -1 | -1 | 0
5 0 | RL|S1|R2 | 150 | 1 | -1 | -1 | 0
6 0 | RL|S1|R3 | 150 | 1 | -1 | -1 | 0
7 0 | RL|S1|R3 | 150 | 1 | -1 | -1 | 0
8 0 | RL|S1|R4 | 150 | 1 | -1 | -1 | 0
9 0 | RL|S1|R4 | 150 | 1 | -1 | -1 | 0
...
6390 0 | RL|S1|R3195 | 150 | 1 | -1 | -1 | 0
6391 0 | RL|S1|R3195 | 150 | 1 | -1 | -1 | 0
6392 0 | RL|S1|R3196 | 150 | 1 | -1 | -1 | 0
6393 0 | RL|S1|R3196 | 150 | 1 | -1 | -1 | 0
6394 0 | RL|S1|R3197 | 150 | 1 | -1 | -1 | 0
6395 0 | RL|S1|R3197 | 150 | 1 | -1 | -1 | 0
6396 0 | RL|S1|R3198 | 150 | 1 | -1 | -1 | 0
6397 0 | RL|S1|R3198 | 150 | 1 | -1 | -1 | 0
6398 0 | RL|S1|R3199 | 150 | 1 | -1 | -1 | 0
6399 0 | RL|S1|R3199 | 150 | 1 | -1 | -1 | 0
6400 1 | RL|S1|R3200 | 150 | 1 | 4479 | 4938059 | 0
[2022-10-24 05:17:37.577] [console] [info] is dataset paired end? true

Segmentation fault (core dumped)
hermidalc commented 1 year ago

I found the issue, Cedar does not work with SAM files (.sam or .sam.gz) and this is what's causing the segfault, even though it's documented to work with SAM files. If I provide a PAM equivalent alignment file it doesn't segfault and executes what looks to be correctly. Still, it should support PuffAligner SAM files, the source code looks like it does, but there's a bug somewhere.