genome / gms

The Genome Modeling System installer
https://github.com/genome/gms/wiki
GNU Lesser General Public License v3.0
78 stars 22 forks source link

Clonality Analysis fails with R command error. #82

Closed gatoravi closed 10 years ago

gatoravi commented 10 years ago

The clonality analysis step within ClinSeq fails on Obi's box with this error,

2014-01-21 23:27:27-0600 GGM: ERROR RUNNING COMMAND. Exit code 1 from: R --vanilla --slave < /opt/gms/AUIS907/fs/AUIS907/info/model_data/9139d1d4d0d6481db98962e328c724ee/builde0f110f79090452eb2ba5d4fc937e899/TST1/clonality/clonality.R

Looks like the R script is trying to access a file that does not exist. This might be somehow related to the previous errors.

obigriffith commented 10 years ago

The problem seems to be here: 2014-01-23 18:24:25-0600 GGM: Error in read.table(file = filename, skip = skip, sep = sep) : 2014-01-23 18:24:25-0600 GGM: no lines available in input 2014-01-23 18:24:25-0600 GGM: Calls: varscan.load_snp_output -> read.table

/opt/gms/AUIS907/fs/AUIS907/info/model_data/9139d1d4d0d6481db98962e328c724ee/build3b25aebceea541348337ffc207f37e21/TST1/clonality/clonality.R tries to call varscan.load_snp_output from /opt/gms/AUIS907/sw/genome/lib/perl/Genome/Model/Tools/Validation/VarScanGraphLib.R

varscan.load_snp_output("/tmp/1188.tmpdir/gm-genome_sys-2014-01-23_18_24_21--7pXX/anonymous4",header=F)->xcopy;
varscan.load_snp_output("/tmp/1188.tmpdir/gm-genome_sys-2014-01-23_18_24_21--7pXX/anonymous4",header=F,min_tumor_depth=20,min_normal_depth=20)->xcopy100;

This is initiated by /opt/gms/AUIS907/sw/genome/lib/perl/Genome/Model/Tools/Validation/ClonalityPlot.pm which creates the R script:

varscan.load_snp_output(\"$temp_path\",header=F)->xcopy;
varscan.load_snp_output(\"$temp_path\",header=F,min_tumor_depth=$readcount_cutoff,min_normal_depth=$readcount_cutoff)->xcopy100;

$temp_path is defined as follows: $temp_path = $self->varscan_file_with_cn; $tfh = FileHandle->new($temp_path,"w"); OR ($tfh,$temp_path) = Genome::Sys->create_temp_file; The latter seems to be the case since I can't find evidence of the varscan_file_with_cn option every being specified and the R output contains what looks like a temp path.

Thus, the apparently missing temp file is created with the following block of code

my %copynumber_hash_tumor;
    if(defined($cnvhmm_file)){
       #build the copy number hashes
       %copynumber_hash_tumor=%{&build_hash($cnvhmm_file)};
    } elsif(defined($cbs_file)){
       #build the copy number hashes
        %copynumber_hash_tumor=%{&build_hash_cbs($cbs_file)};
    }

    #read in the varscan input
    my $varscan_input = new FileHandle ($varscan_file);
    while (my $line2 = <$varscan_input>) {
       chomp($line2);
       my ($chr, $pos, $ref, $var,
       $normal_ref, $normal_var, $normal_var_pct, $normal_IUB,
       $tumor_ref, $tumor_var, $tumor_var_pct, $tumor_IUB,
       $varscan_call, $germline_pvalue, $somatic_pvalue, @otherstuff) = split(/\t/, $line2);

       my $varscan_cn_tumor=&get_cn($chr,$pos,$pos,\%copynumber_hash_tumor);

       print $tfh "$line2\t$varscan_cn_tumor\n";
       if ($positions_highlight && -s $positions_highlight) {
          my $matcher = "$chr\t$pos";
          if (defined $position_highlight_hash{$matcher}) {
             $position_added--;
             my $depth = $tumor_ref + $tumor_var;
             my $varallelefreq = $tumor_var_pct;
             $varallelefreq =~ s/%//;
             print $tfh2 "$varscan_cn_tumor\t$varallelefreq\t$depth\n";
          }
       }
     }
     $tfh->close;
     $tfh2->close;

$varscan_file or $filtered_file is specified as the varscan_file option when Genome::Model::Tools::Validation::ClonalityPlot is called.It looks that is called from Command/GenerateClonalityPlots.pm. Working backwards:

my $filtered_file = $varscan_file . ".filt"
my $varscan_file = $readcounts_varscan_file;
my $readcounts_varscan_file = "$readcounts_outfile".".varscan";
my $readcounts_outfile;
   if ($self->read_counts) {
      $readcounts_outfile = $self->read_counts;
      }
      else {
      $readcounts_outfile = "$adapted_file".".readcounts";

It looks like $adapted_file is being used because the read_counts option is not being specified to Genome::Model::ClinSeq::Command::GenerateClonalityPlots when called in ClinSeq.pm Therefore the missing file is possibly: my $adapted_file ="$output_dir"."allsnvs.hq.novel.tier123.v2.bed.adapted"; where $output_dir is the $clonality_dir as specified in ClinSeq.pm

Therefore, the missing file should be: /opt/gms/AUIS907/fs/AUIS907/info/model_data/9139d1d4d0d6481db98962e328c724ee/build3b25aebceea541348337ffc207f37e21/TST1/clonality/ allsnvs.hq.novel.tier123.v2.bed.adapted.readcounts.varscan or allsnvs.hq.novel.tier123.v2.bed.adapted.readcounts.varscan.filt

Indeed, the latter is empty.

obigriffith commented 10 years ago

Data are filtered from allsnvs.hq.novel.tier123.v2.bed.adapted.readcounts.varscan to produce allsnvs.hq.novel.tier123.v2.bed.adapted.readcounts.varscan.filt as follows:

next if ($tumor_cov < $self->min_tumor_cov || $tumor_cov > $self->max_tumor_cov || $normal_vaf > $self->max_normal_vaf);

There are only 4 variants that ended up in the unfiltered file. It is quite possible that none meet these criteria (hard to tell immediately without headers in the file). Should we add some logic somewhere to prevent an attempt at clonality plot generation when this file is empty? This situation is highly unlikely to occur except when using the small demo data.

obigriffith commented 10 years ago

Perhaps we can just add some logic to only call Genome::Model::Tools::Validation::ClonalityPlot->create() when varscan_file is non-zero?

obigriffith commented 10 years ago

This seems to have worked. See commits: https://github.com/genome/gms-core/commit/9e976f8028cec3f2e04eb86abd6bd105cb71cb30 and https://github.com/genome/gms-core/commit/34b1a14c641c5da0c41a99d96612819654ee0d33