cancerit / CaVEMan

SNV expectation maximisation based mutation calling algorithm aimed at detecting somatic mutations in paired (tumour/normal) cancer samples. Supports both bam and cram format via htslib
http://cancerit.github.io/CaVEMan
GNU Affero General Public License v3.0
60 stars 13 forks source link

estep issue with version 1.13.3 . - fai_access Error #82

Closed ChristopheLegendre closed 6 years ago

ChristopheLegendre commented 6 years ago

@drjsanger Hi David,

I just installed the version 1.13.3 and rerun it with the same exact inputs as I ran the previous version 1.13.2 which had issue with the estep and zlib buffer.

All the steps up to merge step did not raise any error, but estep raised new errors.
after running the following command, for lineid in $(seq 1 22) ; do caveman estep -i {lineid} -f caveman.cfg.ini -l WXS -r WXS -w Human -g ./covs_arr -o ./probs_arr ; done,

I unfortunately got the following errors:

********** >1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 [ERROR] (src/fai_access.c: fai_access_get_count_length_all_contigs:84 errno: No such file or directory) Wrong number of entries (1) found in fasta index file line >1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 [ERROR] (src/estep.c: estep_main:519 errno: None) Error establishing contig count and name length.

I also just try caveman estep -i 22 to simplify the command, and got the exact same error.

The list of contigs I gave you earlier ( #77 ) is the right list of contigs seen in the BAM header, but based on that error message I thought the system did not recognize the contig in the reference_genome fasta file because there are annotations added after the contig name in the line starting with ">" character in the reference genome fasta file.

Here below I attached 1) the lines from the fasta file where the contigs are defined and 2) I added the fai file associated as well

As I thought the annotation could be the problem, therefore, I created a reference genome without the annotation in the contig line. My .fai file was created using samtools faidx ${REF_GENOME_FASTA} command.

Unfortunately, I got the same error for ALL the jobids that ran with the estep module.
********** >1 [ERROR] (src/fai_access.c: fai_access_get_count_length_all_contigs:84 errno: None) Wrong number of entries (1) found in fasta index file line >1 [ERROR] (src/estep.c: estep_main:519 errno: None) Error establishing contig count and name length. In order to reduce runtime, I subsetted the processing to only two contigs, 21 and 22; I got the exact same error as described right above.

Let me know if you have questions or if you need further information.

hs37d5_contigs.zip

ghost commented 6 years ago

Afternoon @ChristopheLegendre My mistake, I've passed the new function the .fa filename, not the index filename. I have made a new hotfix and marked it as prerelease 1.13.5. Let me know if this works for you.

Dave

ChristopheLegendre commented 6 years ago

Hi @drjsanger

Sorry for the delay.

  1. I tried out the pre-release 1.13.5 and I got the "Segmentation fault (core dumped)" Error message that we all fear :-(
    I used the exact same inputs and commands I used with version 1.13.2 and 1.13.3.

  2. I tried first using only contigs representing chromosome 21 and 22 as test and got the segmentation fault. Here I attach a zip file with logs and file and commands for that run. Hope it helps.

CaVEMan.run.zip

  1. As another test, I also tried it using ALL the contigs listed in the BAM (in SQ lines). I noticed that the mstep is actually looking for non-existing contigs, which did not happen in item 3 above when using only chr21 and chr22;

Example of error with mstep, which impact merge and estep:

M-stepping section 4:181586940-185010141 for mstep fetched a reference seq of length 3423202 for this section. [ERROR] (src/fai_access.c: fai_access_get_ref_seqeuence_for_pos:108 errno: No such file or directory) Error fetching reference sequence for region X:-1365916695-144904260. [ERROR] (src/fai_access.c: fai_access_get_ref_seqeuence_for_pos:108 errno: No such file or directory) Error fetching reference sequence for region 19:-1591198826-38028818. [ERROR] (src/fai_access.c: fai_access_get_ref_seqeuence_for_pos:108 errno: No such file or directory) Error fetching reference sequence for region 16:-1844986504-74660215. [ERROR] (src/fai_access.c: fai_access_get_ref_seqeuence_for_pos:108 errno: No such file or directory) Error fetching reference sequence for region ERCC-00053:-1104487891-1023. [ERROR] (src/fai_access.c: fai_access_get_ref_seqeuence_for_pos:108 errno: No such file or directory) Error fetching reference sequence for region 16:-1844986504-50701913. [ERROR] (src/fai_access.c: fai_access_get_ref_seqeuence_for_pos:108 errno: No such file or directory) Error fetching reference sequence for region 20:-1531084355-44680223. [ERROR] (src/fai_access.c: fai_access_get_ref_seqeuence_for_pos:108 errno: No such file or directory) Error fetching reference sequence for region X:-1365916695-128699773. Looking at section 1:28206191-29652276 for mstep M-stepping section 1:28206191-249250621 for mstep

Sorry to keep going on that issue after you closed it; I did not want to open a new one as I think it is related to the same issue.

Let me know if you have any question that may help you fixing the mstep and estep steps.

Best, Chris

ghost commented 6 years ago

Morning @ChristopheLegendre

Let's split this into two issues. The estep issue may be related to the changes, but the mstep issue shouldn't be. I've opened two new issues to track them and prevent this thread from deviating/becoming too verbose.

See #83 and #84