cancerit / cgpCaVEManWrapper

Reference implementation of CGP workflow for CaVEMan SNV analysis
http://cancerit.github.io/cgpCaVEManWrapper/
GNU Affero General Public License v3.0
6 stars 3 forks source link

Bug in flag step: species name. #24

Closed demh closed 9 years ago

demh commented 9 years ago

I think I have found a bug in the last step of Caveman Wrapper (flag). Everything works perfectly until then but afterwards I get this errror message:

Errors from command: /software/perl-5.16.2/bin/perl /lustre/scratch104/sanger/dmh/software/wrapper_installation/bin/cgpFlagCaVEMan.pl -i /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/10/tmpCaveman/SC_TasDevilMT5806280_vs_SIMULATED_DEVIL.muts.ids.vcf -o /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/10/tmpCaveman/SC_TasDevilMT5806280_vs_SIMULATED_DEVIL.flagged.muts.vcf -s Sarcophilus harrisii -m /lustre/scratch104/sanger/dmh/trial_caveman/bam_files/13490_3#10.bam -n /lustre/scratch104/sanger/dmh/trial_caveman/simulated_reads/alignment/aln-pe_MT_picard.bam -b /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag_bed_files -g /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag_bed_files/germline_indel.bed -umv /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/unmatched_vcf -ref /lustre/scratch104/sanger/dmh/references/7.0/Sarcophilus_harrisii.DEVIL7.0.70.dna.toplevel.fa.fai -t genomic -c /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag.vcf.config.ini -v /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag.to.vcf.convert.ini

Unknown parameter: harrisii at /lustre/scratch104/sanger/dmh/software/wrapper_installation/bin/cgpFlagCaVEMan.pl line 739.

The problem comes when specifying the name of the species. This name has to be specified in the same way as in the tumour bam file (otherwise a different error appears), which in our case is Sarcophilus harrisii. I have specified it in the wrapper call in all these ways:

-species 'Sarcophilus harrisii'

-species "Sarcophilus harrisii"

-species Sarcophilus\ harrisii

but in all cases I get an error. It seems that when the call to cgpFlagCaVEMan.pl is made, this is not taken into account and the species name appears with a white space in the middle, confounding the command. I guess you did not come accross this error before because you probably specify something like HUMAN, which is only one word.

sb43 commented 9 years ago

Hi, Try specifying -species Sarcophilus , if script doesn't find matching one to bam header it should automatically default to bam header [SP:Sarcophilus harrisii] Thanks, Shriram

sb43 commented 9 years ago

Hi, Try specifying -species Sarcophilus , if script doesn't find matching one to bam header it should automatically default to bam header [SP:Sarcophilus harrisii] Thanks, Shriram — Shriram Bhosle Senior Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute

sb43@sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 7106 Office: H104

From: demh notifications@github.com<mailto:notifications@github.com> Reply-To: cancerit/cgpCaVEManWrapper reply@reply.github.com<mailto:reply@reply.github.com> Date: Thursday, 11 June 2015 15:59 To: cancerit/cgpCaVEManWrapper cgpCaVEManWrapper@noreply.github.com<mailto:cgpCaVEManWrapper@noreply.github.com> Subject: [cgpCaVEManWrapper] Bug in flag step: species name. (#24)

I think I have found a bug in the last step of Caveman Wrapper (flag). Everything works perfectly until then but afterwards I get this errror message:

Errors from command: /software/perl-5.16.2/bin/perl /lustre/scratch104/sanger/dmh/software/wrapper_installation/bin/cgpFlagCaVEMan.pl -i /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/10/tmpCaveman/SC_TasDevilMT5806280_vs_SIMULATED_DEVIL.muts.ids.vcf -o /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/10/tmpCaveman/SC_TasDevilMT5806280_vs_SIMULATED_DEVIL.flagged.muts.vcf -s Sarcophilus harrisii -m /lustre/scratch104/sanger/dmh/trial_caveman/bam_files/13490_3#10.bam -n /lustre/scratch104/sanger/dmh/trial_caveman/simulated_reads/alignment/aln-pe_MT_picard.bam -b /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag_bed_files -g /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag_bed_files/germline_indel.bed -umv /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/unmatched_vcf -ref /lustre/scratch104/sanger/dmh/references/7.0/Sarcophilus_harrisii.DEVIL7.0.70.dn a.toplevel.fa.fai -t genomic -c /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag.vcf.config.ini -v /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag.to.vcf.convert.ini

Unknown parameter: harrisii at /lustre/scratch104/sanger/dmh/software/wrapper_installation/bin/cgpFlagCaVEMan.pl line 739.

The problem comes when specifying the name of the species. This name has to be specified in the same way as in the tumour bam file (otherwise a different error appears), which in our case is Sarcophilus harrisii. I have specified it in the wrapper call in all these ways:

-species 'Sarcophilus harrisii'

-species "Sarcophilus harrisii"

-species Sarcophilus\ harrisii

but in all cases I get an error. It seems that when the call to cgpFlagCaVEMan.pl is made, this is not taken into account and the species name appears with a white space in the middle, confounding the command. I guess you did not come accross this error before because you probably specify something like HUMAN, which is only one word.

— Reply to this email directly or view it on GitHubhttps://github.com/cancerit/cgpCaVEManWrapper/issues/24.

The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

demh commented 9 years ago

Dear Shriram,

After using -species Sarcophilus I still have the same problem:

Errors from command: /software/perl-5.16.2/bin/perl /lustre/scratch104/sanger/dmh/software/wrapper_installation/bin/cgpFlagCaVEMan.pl -i /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/10/tmpCaveman/SC_TasDevilMT5806280_vs_SIMULATED_DEVIL.muts.ids.vcf -o /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/10/tmpCaveman/SC_TasDevilMT5806280_vs_SIMULATED_DEVIL.flagged.muts.vcf -s Sarcophilus harrisii -m /lustre/scratch104/sanger/dmh/trial_caveman/bam_files/13490_3#10.bam -n /lustre/scratch104/sanger/dmh/trial_caveman/simulated_reads/alignment/aln-pe_MT_picard.bam -b /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag_bed_files -g /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag_bed_files/germline_indel.bed -umv /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/unmatched_vcf -ref /lustre/scratch104/sanger/dmh/references/7.0/Sarcophilus_harrisii.DEVIL7.0.70.dna.toplevel.fa.fai -t genomic -c /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag.vcf.config.ini -v /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag.to.vcf.convert.ini

Unknown parameter: harrisii at /lustre/scratch104/sanger/dmh/software/wrapper_installation/bin/cgpFlagCaVEMan.pl line 739.

As I mention before, the issue arises because the value in the BAM file (that now is used as the default in the absence of agreement with the -species flag) is Sarcophilus harrisii, which creates the problem in the call (see the error previously reported). I think there are two options here: you could change the code and solve the bug (so in the call to cgpFlagCaVEMan.pl appears -s "Sarcophilus harrisii" instead of -s Sarcophilus harrisii) or I could edit the BAM files.

I have checked the new BAM files we will be using and they have DEVIL as species name, so we should have no problems in the future. However, it may be useful to fix this for future users.

Thanks a lot,

Daniel

sb43 commented 9 years ago

Hi Daniel,

Species name in your command [ highlighted red ]still looks like two words rather than just Sarcophilus. Yes, longterm solution is to fix the code.

Cheers, Shriram

From: demh notifications@github.com<mailto:notifications@github.com> Reply-To: cancerit/cgpCaVEManWrapper reply@reply.github.com<mailto:reply@reply.github.com> Date: Thursday, 11 June 2015 17:39 To: cancerit/cgpCaVEManWrapper cgpCaVEManWrapper@noreply.github.com<mailto:cgpCaVEManWrapper@noreply.github.com> Cc: Shriram Bhosle sb43@sanger.ac.uk<mailto:sb43@sanger.ac.uk> Subject: Re: [cgpCaVEManWrapper] Bug in flag step: species name. (#24)

Dear Shriram,

After using -species Sarcophilus I still have the same problem:

Errors from command: /software/perl-5.16.2/bin/perl /lustre/scratch104/sanger/dmh/software/wrapper_installation/bin/cgpFlagCaVEMan.pl -i /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/10/tmpCaveman/SC_TasDevilMT5806280_vs_SIMULATED_DEVIL.muts.ids.vcf -o /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/10/tmpCaveman/SC_TasDevilMT5806280_vs_SIMULATED_DEVIL.flagged.muts.vcf -s Sarcophilus harrisii -m /lustre/scratch104/sanger/dmh/trial_caveman/bam_files/13490_3#10.bam -n /lustre/scratch104/sanger/dmh/trial_caveman/simulated_reads/alignment/aln-pe_MT_picard.bam -b /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag_bed_files -g /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag_bed_files/germline_indel.bed -umv /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/unmatched_vcf -ref /lustre/scratch104/sanger/dmh/references/7.0/Sarcophilus_harrisii.DEVIL7.0.70.dn a.toplevel.fa.fai -t genomic -c /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag.vcf.config.ini -v /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag.to.vcf.convert.ini

Unknown parameter: harrisii at /lustre/scratch104/sanger/dmh/software/wrapper_installation/bin/cgpFlagCaVEMan.pl line 739.

As I mention before, the issue arises because the value in the BAM file (that now is used as the default in the absence of agreement with the -species flag) is Sarcophilus harrisii, which creates the problem in the call (see the error previously reported). I think there are two options here: you could change the code and solve the bug (so in the call to cgpFlagCaVEMan.pl appears -s "Sarcophilus harrisii" instead of -s Sarcophilus harrisii) or I could edit the BAM files.

I have checked the new BAM files we will be using and they have DEVIL as species name, so we should have no problems in the future. However, it may be useful to fix this for future users.

Thanks a lot,

Daniel

— Reply to this email directly or view it on GitHubhttps://github.com/cancerit/cgpCaVEManWrapper/issues/24#issuecomment-111200718.

The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

keiranmraine commented 9 years ago

Hi Shriram,

The code still picks up the value from the header of the BAM and passes it through to a command line shell. It will be a minor fix but won't get done today.

Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute

kr2@sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 7703 Office: H104

On 11 Jun 2015, at 20:43, Shriram Bhosle notifications@github.com wrote:

Hi Daniel,

Species name in your command [ highlighted red ]still looks like two words rather than just Sarcophilus. Yes, longterm solution is to fix the code.

Cheers, Shriram

From: demh notifications@github.com<mailto:notifications@github.com> Reply-To: cancerit/cgpCaVEManWrapper reply@reply.github.com<mailto:reply@reply.github.com> Date: Thursday, 11 June 2015 17:39 To: cancerit/cgpCaVEManWrapper cgpCaVEManWrapper@noreply.github.com<mailto:cgpCaVEManWrapper@noreply.github.com> Cc: Shriram Bhosle sb43@sanger.ac.uk<mailto:sb43@sanger.ac.uk> Subject: Re: [cgpCaVEManWrapper] Bug in flag step: species name. (#24)

Dear Shriram,

After using -species Sarcophilus I still have the same problem:

Errors from command: /software/perl-5.16.2/bin/perl /lustre/scratch104/sanger/dmh/software/wrapper_installation/bin/cgpFlagCaVEMan.pl -i /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/10/tmpCaveman/SC_TasDevilMT5806280_vs_SIMULATED_DEVIL.muts.ids.vcf -o /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/10/tmpCaveman/SC_TasDevilMT5806280_vs_SIMULATED_DEVIL.flagged.muts.vcf -s Sarcophilus harrisii -m /lustre/scratch104/sanger/dmh/trial_caveman/bam_files/13490_3#10.bam -n /lustre/scratch104/sanger/dmh/trial_caveman/simulated_reads/alignment/aln-pe_MT_picard.bam -b /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag_bed_files -g /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag_bed_files/germline_indel.bed -umv /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/unmatched_vcf -ref /lustre/scratch104/sanger/dmh/references/7.0/Sarcophilus_harrisii.DEVIL7.0.70.dn a.toplevel.fa.fai -t genomic -c /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag.vcf.config.ini -v /lustre/scratch104/sanger/dmh/trial_caveman/run_caveman_wrapper/post_parameters/flag.to.vcf.convert.ini

Unknown parameter: harrisii at /lustre/scratch104/sanger/dmh/software/wrapper_installation/bin/cgpFlagCaVEMan.pl line 739.

As I mention before, the issue arises because the value in the BAM file (that now is used as the default in the absence of agreement with the -species flag) is Sarcophilus harrisii, which creates the problem in the call (see the error previously reported). I think there are two options here: you could change the code and solve the bug (so in the call to cgpFlagCaVEMan.pl appears -s "Sarcophilus harrisii" instead of -s Sarcophilus harrisii) or I could edit the BAM files.

I have checked the new BAM files we will be using and they have DEVIL as species name, so we should have no problems in the future. However, it may be useful to fix this for future users.

Thanks a lot,

Daniel

— Reply to this email directly or view it on GitHubhttps://github.com/cancerit/cgpCaVEManWrapper/issues/24#issuecomment-111200718.

The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. — Reply to this email directly or view it on GitHub.

demh commented 9 years ago

Hi everyone,

Yeah, it is exactly what Keiran said. I will try to edit my BAM files while you are fixing the bug so I can go ahead.

Thank you so much for your efforts.

Daniel

ghost commented 9 years ago

Hi Daniel,

This is due to the way we are using species name as a section name in our config files and reading them with Config::IniFiles. I can 'fix' this by allowing full Genus species names and inserting an '_' character between them in place of the space when attempting to access the config file... but the config files will need to be named accordingly.

demh commented 9 years ago

Hi David,

I solved this problem just by editing the species names in the BAM files if I remember correctly, so no need to change it unless you think it is useful for future users.

Regards,

Daniel

keiranmraine commented 9 years ago

Should be resolved by 59b4ef7 in next release