FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
391 stars 102 forks source link

Better Handling of Bismark2Report Uninitialized Error due to missing nucleotide key in nucleotide.stats report #711

Open rhondene opened 1 week ago

rhondene commented 1 week ago

Hi @FelixKrueger,

There are cases where the nucleotide_stats does not generate the full list of nucleotide headers as expected by the bismark2report script. Below I post the source ode causing a unitialized error and potentially a division by error if nuc_exp=0.

For now I can implement my own error handling in the pipeline I am running bismark in.

relevant bismark2report source code

foreach my $key (sort {$a cmp $b} keys %nucs){
    foreach my $key ('A','T','C','G','AC','CA','TC','CT','CC','CG','GC','GG','AG','GA','TG','GT','TT','TA','AT','AA'){
        my $nuc_obs = $nucs{$key}->{obs}->{percent};
        my $nuc_exp = $nucs{$key}->{exp}->{percent};
        my $counts_obs = $nucs{$key}->{obs}->{counts};
        my $counts_exp = $nucs{$key}->{exp}->{counts};
        my $cov = $nucs{$key}->{obs}->{coverage};

        # calculating log2 observed/expected
        my $ratio = $nuc_obs/$nuc_exp;
        #  my $logratio = sprintf ("%.2f",log($ratio)/log(2));
        # if (abs($logratio) > $minmax){
        # $minmax = abs($logratio);
        # }

ERROR

Traceback of the Error (abbreviated)


`Using the following nucleotide coverage report: > LAMdirect-364-NTG-200-5_S202_R1_trimmed_bismark_bt2_pe.nucleotide_stats.txt <
Processing nucleotide coverage report './LAMdirect-364-NTG-200-5_S202_R1_trimmed_bismark_bt2_pe.nucleotide_stats.txt' ...
Use of uninitialized value $nuc_exp in division (/) at /home/ec2-user/anaconda3/envs/bismark_env/bin/bismark2report line 640, <NUC> line 9.
Use of uninitialized value $nuc_obs in division (/) at /home/ec2-user/anaconda3/envs/bismark_env/bin/bismark2report line 640, <NUC> line 9.
Illegal division by zero at /home/ec2-user/anaconda3/envs/bismark_env/bin/bismark2report line 640, <NUC> line 9.

Traceback (most recent call last):
 ...
  File "/home/ec2-user/SageMaker/NGS-Analysis/amp_biseq/amp_biseq/report_utils.py", line 112, in run_bismark2report

subprocess.CalledProcessError: Command 'bismark2report --alignment_report ./LAMdirect-364-NTG-200-5_S202_R1_trimmed_bismark_bt2_PE_report.txt --splitting_report ./methylation_extraction_results/LAMdirect-364-NTG-200-5_S202_R1_trimmed_bismark_bt2_pe_splitting_report.txt --mbias_report ./methylation_extraction_results/LAMdirect-364-NTG-200-5_S202_R1_trimmed_bismark_bt2_pe.M-bias.txt --nucleotide_report ./LAMdirect-364-NTG-200-5_S202_R1_trimmed_bismark_bt2_pe.nucleotide_stats.txt --dir ./MY _Amplicon_1' returned non-zero exit status 255.

Data nucleotide_stats table that triggered the error:

image
FelixKrueger commented 2 days ago

I have just tried to handle division by 0 better, can you please clone the Bismark dev branch and test it? Alternatively, would you mind sending the files you used in the example above so I can test it over here? Many thanks!