cfe-lab / MiCall

Pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C
https://cfe-lab.github.io/MiCall
GNU Affero General Public License v3.0
14 stars 9 forks source link

dev pipeline on cluster breaks on 140522 #91

Closed ArtPoon closed 10 years ago

ArtPoon commented 10 years ago

Looks like another failed alignment:

2014-08-26 20:52:25.900231 - [INFO] Launching ' ./csf2counts.py /data/miseq/140522_M01841_0063_000000000-A64FB/46824A-3515-HLA-B-E99601CLIMX-PR-RT_S23.aligned.csv /data/miseq/140522_M01841_0063_000000000-A64FB/46824A-3515-HLA-B-E99601CLIMX-PR-RT_S23.nuc.csv /data/miseq/140522_M01841_0063_000000000-A64FB/46824A-3515-HLA-B-E99601CLIMX-PR-RT_S23.amino.csv /data/miseq/140522_M01841_0063_000000000-A64FB/46824A-3515-HLA-B-E99601CLIMX-PR-RT_S23.indels.csv /data/miseq/140522_M01841_0063_000000000-A64FB/46824A-3515-HLA-B-E99601CLIMX-PR-RT_S23.conseq.csv'
2014-08-26 20:52:43.166066 - [ERROR] Traceback (most recent call last):
2014-08-26 20:52:43.166415 - [ERROR] File "./csf2counts.py", line 466, in <module>
2014-08-26 20:52:43.166547 - [ERROR] main()
2014-08-26 20:52:43.166674 - [ERROR] File "./csf2counts.py", line 344, in main
2014-08-26 20:52:43.166796 - [ERROR] inserts)
2014-08-26 20:52:43.166917 - [ERROR] File "./csf2counts.py", line 187, in write
2014-08-26 20:52:43.167040 - [ERROR] query_end = max(qindex_to_refcoord.keys())
2014-08-26 20:52:43.167182 - [ERROR] ValueError: max() arg is an empty sequence
Traceback (most recent call last):
  File "sample_pipeline.py", line 319, in <module>
    main()
  File "sample_pipeline.py", line 251, in main
    count_samples(fastq_samples, worker, args)
  File "sample_pipeline.py", line 114, in count_samples
    stderr=log_path))
  File "/usr/local/share/miseq/development/fifo_scheduler.py", line 148, in run_job
    return self.run_job_unlogged(job)
  File "/usr/local/share/miseq/development/fifo_scheduler.py", line 135, in run_job_unlogged
    stderr=stderr)
  File "/usr/local/lib/python2.7/subprocess.py", line 511, in check_call
    raise CalledProcessError(retcode, cmd)
ArtPoon commented 10 years ago

Looks like a frameshift slipped into the sample consensus for RT at the remap stage. Investigating.

ArtPoon commented 10 years ago

This sample has a single bp deletion of RT 2861A in nearly all reads covering this interval, leading to a frameshift. This is based on the preliminary map so it seems to be real. Going to the raw data, grepping this interval with deletion returns 14,878 reads. Without the deletion returns nothing.

Proposed solution: Add deletions to consensus sequence in pileup_to_conseq() in remap.py. Cull codon deletions (3 gaps in a row) before returning sequence.

ArtPoon commented 10 years ago

Regex substitution fails when no groups are matched:

2014-08-27 14:36:40.812886 - [ERROR] conseqs[refname] = pileup_to_conseq(f, consensus_q_cutoff)
2014-08-27 14:36:40.813419 - [ERROR] File "./remap.py", line 397, in pileup_to_conseq
2014-08-27 14:36:40.813650 - [ERROR] conseq = re.sub(pat, r'\g<1>\g<3>', conseq)
2014-08-27 14:36:40.813864 - [ERROR] File "/usr/local/lib/python2.7/re.py", line 151, in sub
2014-08-27 14:36:40.814067 - [ERROR] return _compile(pattern, flags).sub(repl, string, count)
2014-08-27 14:36:40.814278 - [ERROR] File "/usr/local/lib/python2.7/re.py", line 275, in filter
2014-08-27 14:36:40.814474 - [ERROR] return sre_parse.expand_template(template, match)
2014-08-27 14:36:40.814669 - [ERROR] File "/usr/local/lib/python2.7/sre_parse.py", line 789, in expand_template
2014-08-27 14:36:40.814879 - [ERROR] raise error, "invalid group reference"
2014-08-27 14:36:40.815084 - [ERROR] sre_constants.error: invalid group reference
ArtPoon commented 10 years ago

Forgot to add groups for matching [ACGT] enclosing codon deletion. Fixed in next commit.