BenoitCastandet / chloroseq

a bioinformatic pipeline to systematically analyse the chloroplast transcriptome using RNA-Seq
6 stars 4 forks source link

Problems I encountered when performing Analysis 2 #2

Open githubjiangweiling opened 6 years ago

githubjiangweiling commented 6 years ago

Hi, I have been trying to use the pipeline of chloroseq recently and I am currently working on Analysis 2. I executed the following command: ./chloroseq.pl -a 2 -b ./../../tophat_out664/accepted_hits.bam -e ./../TAIR10_ChrC_files/TAIR10_ChrC_exon.gff3 -i ./../TAIR10_ChrC_files/TAIR10_ChrC_introns.gff3 -g 154478 -n NC_000932.1 -s ./../TAIR10_ChrC_files/TAIR10_ChrC_splice_sites_sort.gff3 Then the computer displays: `NC_000932.1.bam file exists, moving to next step. Starting splicing analysis (Analysis 2). ***** WARNING: File NC_000932.1.bam has inconsistent naming convention for record: NC_000932.1 39 139 SRR944366.13760867 50 -

***** WARNING: File NC_000932.1.bam has inconsistent naming convention for record: NC_000932.1 39 139 SRR944366.13760867 50 -

[samopen] SAM header is present: 1 sequences. [sam_read1] reference 'ID:TopHat CL:/usr/bin/tophat -G ref_sequence.gff3 -p 16 -g 2 --no-novel-juncs -o ./tophat_out664 ref_index ./quality_trim/SRR944366.fastq VN:2.1.0 ' is recognized as ''. [main_samview] truncated file. [samopen] SAM header is present: 1 sequences. [sam_read1] reference 'ID:TopHat CL:/usr/bin/tophat -G ref_sequence.gff3 -p 16 -g 2 --no-novel-juncs -o ./tophat_out664 ref_index ./quality_trim/SRR944366.fastq VN:2.1.0 ' is recognized as ''. [main_samview] truncated file. Illegal division by zero at ./chloroseq.pl line 416, line 1.` I am a beginner and I have not found a discussion about this issue. So I can only turn to you. In addition, I also perform analysis 3. I got the 9 files described on the website, but I don't know if they really are what we should get. Therefore, if possible, can you give a detailed description or example. Thank you!

BenoitCastandet commented 6 years ago

Hi

The illegal division by 0 happens when you have no read spanning the exon/exon and exon/intron junctions. This sometimes happens when you have a low coverage or when you're using RNA-Seq data coming from polyA RNA. Organellar transcriptomes should not be studied using polyA purified RNAs. I haven't fixed this issue yet but all the intermediates files are kept when it carshes meaning that you have access to all the information you need and you can compute the splicing efficiency.

For the first warning, you have to be careful and be consistent in the names you use: Name of the chromosome should be the same between all the annotation files and the files you used to perform the alignment. This means that the names in the bam file should match the ones in the annotation files.

For a more complete guide you can read this protocol we recently published

https://www.ncbi.nlm.nih.gov/pubmed/29987730

The article can be accessed here

https://github.com/BenoitCastandet/editing_development

Hope this helps, if not I'll have a closer look at your input and output. I'll sned examples of the analysis 3 files later on

Best

Benoît

githubjiangweiling commented 6 years ago

Hi, Thank you for your generous help! I will try to solve the problem according to your prompts. Best wishes!