Kingsford-Group / ribomap

12 stars 4 forks source link

riboprof Segmentation fault (core dumped) #4

Closed ghost closed 7 years ago

ghost commented 8 years ago

riboprof compiled with boost 1.59 gcc 5.3.0 on CentOS 6.7

./riboprof --fasta ./ref/gencode.v18.pc_transcripts_filter.fa --mrnabam alignment/GSM546921_filtered_sequence_transcript_Aligned.out.bam --ribobam alignment/GSM546920_filtered_sequence_transcript_Aligned.out.bam --min_fplen 25 --max_fplen 36 --offset ./offset.txt --sf sm_quant/quant.sf --tabd_cutoff 0 --out ./outputs/GSM546920_filtered_sequence cds_range file not provided, assume transcript fasta only contains cds sequences. getting transcript info... total number of transcripts in transcriptome: 93863 assigning ribo-seq reads... constructing profile class... number of transcripts in profile class: 61083 loading reads from bam... getting readlen mapping to P site offset... getting alignment records... total number of reads: 8087973 getting read type and p-sites... total output footprint: 8087973 total: 8087973 multi_mapped: 0 (0.00 %) assigning reads to frame 0 loci... reads used: 0 reads assigned: 0 total input count not matching total assigning count! assigning reads to frame 1 and 2 loci... reads used: 0 reads assigned: 0 total input count not matching total assigning count! assigning reads to UTR loci... reads used: 0 reads assigned: 0 total input count not matching total assigning count! assigning RNA-seq reads... number of transcripts in profile class: 61083 loading reads from bam... getting alignment records... total number of reads: 5243688 getting read type and p-sites... total output footprint: 5243688 total: 5243688 multi_mapped: 3907204 (74.51 %) assigning reads... reads used: 5243688 reads assigned: 5243688 write profile results... rank transcripts... transcripts in rank list: 0 Segmentation fault (core dumped)

ll outputs/ total 13M -rw------- 1 vitalyg training-wx-grp 0 Apr 19 13:24 GSM546920_filtered_sequence_abundant.list -rw------- 1 vitalyg training-wx-grp 0 Apr 19 13:24 GSM546920_filtered_sequence.base -rw------- 1 vitalyg training-wx-grp 0 Apr 19 13:24 GSM546920_filtered_sequence.codon -rw------- 1 vitalyg training-wx-grp 0 Apr 19 13:24 GSM546920_filtered_sequence_scarce.list -rw------- 1 vitalyg training-wx-grp 11M Apr 19 13:24 GSM546920_filtered_sequence.stats

how-wonderful commented 8 years ago

could you try including cds range file? ./riboprof --cds_range ./ref/cds_range.txt ... also is offset.txt in the same directory as riboprof?

ghost commented 8 years ago

Including --cds_range resolved the issue. Thanks.

AlexandraBomane commented 7 years ago

I have exactly the same issue.

Here my command line : riboprof --fasta /run/hg18.transcripts.fa --mrnabam S6_mRNA.transcriptome.mapping.bam --ribobam S6_align_star/Aligned.toTranscriptome.out.bam --min_fplen 25 --max_fplen 35 --offset Sh3_riboprofile_p_offsets_corrected.txt --sf S6_mRNA_quantification/quant.sf --out S6_isoform_levelestimation --useSecondary --cds_range cdsrange.txt --tabd_cutoff 0

Sh3_riboprofile_p_offsets_corrected.txt and cdsrange.txt are in the working directory and "riboprof" command is in the PATH.

I attached my offset and CDS range files :

cdsrange.txt Sh3_riboprofile_p_offsets_corrected.txt

how-wonderful commented 7 years ago

Hi AlexandraBomane, Could you try deleting the header line (length p_offset) in Sh3_riboprofile_p_offsets_corrected.txt?

AlexandraBomane commented 7 years ago

Thanks for your reply how-wonderful, I think that this trick will resolve my problem.

I have another question about these following lines :

profile index out of bound! readID:CGTCAGATGTCTGGGGCGCAGATCAAAA 15 41219 1170-1171 570 profile index out of bound! readID:CGTCAGATGTCTGGGGCGCAGATCAAAA 15 41212 1988-1989 1130 profile index out of bound! readID:CGTCAGATGTCTGGGGCGCAGATCAAAA 15 41208 877-878 657 profile index out of bound! readID:CGTCAGATGTCTGGGGCGCAGATCAAAA 15 41204 889-890 519 profile index out of bound! readID:CGTCAGATGTCTGGGGCGCAGATCAAAA 15 41200 1023-1024 559 profile index out of bound! readID:CGTCAGATGTCTGGGGCGCAGATCAAAA 15 41199 1069-1070 574 profile index out of bound! readID:CGTCAGATGTCTGGGGCGCAGATCAAAA 15 41196 1145-1146 604 profile index out of bound! readID:CGTCAGATGTCTGGGGCGCAGATCAAAA 15 41193 1199-1200 538 profile index out of bound! readID:AGTCATCGTGGTCCTGCGGAACCCGCTCATC 7 101119 582-583 555

What means exactly this warning ? Could it be a problem with my CDS range or my offsets ?

how-wonderful commented 7 years ago

Hi AlexandraBomane, Your guess for the CDS range might be correct. The numbers listed here are: read sequences (e.g. CGTCAGATGTCTGGGGCGCAGATCAAAA ) | number of mapped positions | reference ID of current mapping | mapping CDS position start | mapping CDS position end | length of the CDS of the mapped transcript. The important part is the last three numbers, take the first warning for example: The mapping position is 1170-1171, but the total CDS length is 570, so the mapping position is out side of the CDS range. side note: The mapping length itself is weird: it is 1, which is unlikely. It could be the case that the mapping length within CDS region is 1.

If you check the transcriptome fasta file, the 41220th entry, that should be the transcript with reference ID 41219. You can check to see whether the transcript length is greater than 570 (assume there are UTR regions for this transcript). If you go to the cdsrange.txt, and find the cds range for that transcript, you can verify whether the cds length is indeed 570. If not, you CDS range might be off.

Reads that causes warning are discarded to avoid core dump.

Good luck and let me know how it goes. You can provide me the transcript fasta and a subset of the problematic reads, and I can try to trouble shoot, but it might take me some time, since I just started a new job and am still adjusting.

how-wonderful commented 7 years ago

A more user-friendly and informative warning message is needed