genome / gms

The Genome Modeling System installer
https://github.com/genome/gms/wiki
GNU Lesser General Public License v3.0
78 stars 22 forks source link

can't abandon/stop running build #131

Closed obigriffith closed 10 years ago

obigriffith commented 10 years ago

Currently, if I attempt to abandon or stop a running build I see the following failure:

ogriffit@GGMS /opt/gms/XFOKC32/sw/genome (gms-pub)> genome model build stop 350dfa8932f643fd966132559bb6d09d
'builds' may require verification...
Resolving parameter 'builds' from command argument '350dfa8932f643fd966132559bb6d09d'... found 1
Screening builds that you are not able to modify... Found 1 builds out of 1 possible.
Attempting to stop build: 350dfa8932f643fd966132559bb6d09d

Errors Summary:
350dfa8932f643fd966132559bb6d09d of H_NJ-HCC1395.clin_seq: 
    - Failed to stop build: .
ERROR: Please see 'genome model build stop --help' for more information.
malachig commented 10 years ago

Notes from offline discussion from @sakoht and @mkiwala-g

sGMS is failing to "genome model build stop" b/c it is missing Job::Iterator. Where does that comes from? Is it home grown, but not in genome.git?

I had never heard of that class before. As a first attempt to find it, I tried this command from my TGI workstation:

> genome-perl -e'use Job::Iterator'
> Can't locate Job/Iterator.pm in @INC (@INC contains: /gapp/noarch/lib/perl5 /gsc/scripts/opt/genome/current/user/lib/perl/x86_64-linux-gnu-thread-multi /gsc/scripts/opt/genome/current/user/lib/perl /gsc/scripts/lib/perl /gsc/scripts/gsc/info/lib /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at -e line 1.

Which made me think the “Job::Iterator” package was being defined inside the file for another package. I searched the genome/gms-core on github for “Job::Iterator”, which lead me to a package definition within lib/perl/Genome/Sys/LSF/JobIterator.pm.

malachig commented 10 years ago

It looks like the attempt to use Job::Iterator is here (Genome/Model/Build.pm):

sub _get_job {
    use Genome::Model::Command::Services::Build::Scan;
    my $self = shift;
    my $job_id = shift;

    my @jobs = ();
    my $iter = Job::Iterator->new($job_id);
    while (my $job = $iter->next) {
         push @jobs, $job;
   }

    if (@jobs > 1) {
        $self->error_message("More than 1 job found for this build? Alert apipe");
        return 0;
    }

   return shift @jobs;
}
malachig commented 10 years ago

It is not yet clear to me how this is working within TGI right now but I believe the problem is resolved in the sGMS by loading the missing module in place as follows within _get_job of Genome/Model/Build.pm:

sub _get_job {
    use Genome::Model::Command::Services::Build::Scan;
    use Genome::Sys::LSF::JobIterator;
    my $self = shift;
    my $job_id = shift;
    my @jobs = ();
    my $iter = Job::Iterator->new($job_id);
    while (my $job = $iter->next) {
         push @jobs, $job;
   }
   if (@jobs > 1) {
        $self->error_message("More than 1 job found for this build? Alert apipe");
        return 0;
   }
   return shift @jobs;
}

I would like to better understand what is going on with the namespace/inheritance of these classes before pushing this to master...

malachig commented 10 years ago

It looks like use Genome::Sys::LSF::JobIterator; is actually already in master. Perhaps was missed during the merge of this module where some sGMS specific code exists because of differences between the behavior of LSF and OpenLava.

I added the missing line to the gms-pub branch of gms-core here: https://github.com/genome/gms-core/commit/82ea26170a6da27b87519c65a6898c7a28931ef6

malachig commented 10 years ago

This issue is now resolved. Running builds are now abandoned cleanly even if these have running jobs on OpenLava. Closing.