genome / gms

The Genome Modeling System installer
https://github.com/genome/gms/wiki
GNU Lesser General Public License v3.0
78 stars 22 forks source link

Exome RefAlign builds consistently crash at reference_coverage step in workflow [AND] Reference Alignment software results are not exported correctly to the standalone gms. #15

Closed malachig closed 10 years ago

malachig commented 11 years ago

Reference alignment builds for TST1 currently can not get past the ref-cov step because an input software result is missing. Specifically, the ref-cov step needs a feature-list result to run properly. This feature-list includes a bed file for the exome target regions.

When the build fails, the following errors are generated:

ERROR: Calling get on SoftwareResult (unless getting by id) is slow and possibly incorrect.

ERROR: Can't open file (/opt/gms/GMS1/fs/gc4095/info/feature_list/7BF768EF51FB11E1A0743039993C62A0/7BF768EF51FB11E1A0743039993C62A0.bed) to md5sum: No such file or directory at /opt/gms-1E0X346/sw/genome/lib/

I confirmed that this file (and the entire result) are missing from my GMS1 instance. I expect this result is not getting dumped correctly during the initial creation of the TST1 metadata object, or perhaps just not getting imported correctly...

mg

sakoht commented 11 years ago

Do we have that as an input on those models?

If so, is it a "name" instead of an ID? If so the dumper "genome model export metadata" will need some exception logic like on the target_region_set_name for refalign.

On Oct 3, 2013, at 12:02 PM, Malachi Griffith notifications@github.com wrote:

Reference alignment builds for TST1 current can not the ref-cov step because a an input software result is missing. Specifically, the ref-cov step needs a feature-list result to run properly. This feature-list includes a bed file for the exome target regions.

When the build fails, the following errors are generated:

ERROR: Calling get on SoftwareResult (unless getting by id) is slow and possibly incorrect.

ERROR: Can't open file (/opt/gms/GMS1/fs/gc4095/info/feature_list/7BF768EF51FB11E1A0743039993C62A0/7BF768EF51FB11E1A0743039993C62A0.bed) to md5sum: No such file or directory at /opt/gms-1E0X346/sw/genome/lib/

— Reply to this email directly or view it on GitHub.

malachig commented 11 years ago

I believe there are two inputs on exome ref-align models that point to these feature list objects.

'--target-region-set-names' and '--region-of-interest-set-name'

In order to run the exome analysis you need to define both a target region set name and region of interest. This is generally done by name when you do it at the command line. Since I can't do an install because of the dependency issue, I can't confirm this right now.

It does seem like there is logic in "/lib/perl/Genome/Model/Command/Export/Metadata.pm" to dump target region software results. Maybe this was added after the metadata dump was created? Or maybe something is also needed for region of interest software results? Or maybe it just isn't working quite right?

Wish I understood the whole software results business a bit better...

sakoht commented 11 years ago

As we discussed verbally, the real issue here was that the DB data for the feature list was present, but the FS data was not.

I just copied it into the FTP staging location: cp -r /gscmnt/gc4095/info/feature_list/7BF768EF51FB11E1A0743039993C62A0 /gscmnt/sata102/info/ftp-staging/pub/software/gms/testdata/GMS1/fs/gc4095/info/feature_list/

If you ever forget where the staging directory is for the FTP site, I just do this, because it is in a comment in the Makefile. (The commented-out code does scp via a random blade:

grep blade Makefile

We will also need to sync amazon from there. Since the tool requires Ubuntu precise, that will have to happen after doing an install via FTP locally.

malachig commented 11 years ago

Ok, sounds good. For future reference and convenience, at this time the staging dir is here:

/gscmnt/sata102/info/ftp-staging/pub/software/gms/testdata/GMS1/setup/archive-files/

malachig commented 11 years ago

We are still blocked on reference-alignment at this step, but I think slightly further. Now we get the following errors:

ERROR: Calling get on SoftwareResult (unless getting by id) is slow and possibly incorrect. ERROR: Loaded converter could not be used due to errors: ERROR: property 'algorithm': specified algorithm not found in /opt/gms/HU9D538/sw/genome/lib/perl/Genome/Model/Build/ReferenceSequence/Converter.pm ERROR: Could not convert to requested reference! ERROR: Could not convert to requested reference! at /opt/gms/HU9D538/sw/genome/lib/perl/Genome/FeatureList.pm line 429.

Looking at the code, it is not immediately obvious what is going on there... Anyone know who might be familiar with this code?

malachig commented 10 years ago

From Tom:

"The "algorithm" is the name of a subroutine defined in Genome::Model::Build::ReferenceSequence::Converter that should be used to convert betwixt the reference sequences, e.g.:

[tmooney@linus284:~]$ genome model reference-sequence converter list-f destination_reference_build_id=106942997 ID SOURCE_REFERENCE_BUILD
DESTINATION_REFERENCE_BUILD
ALGORITHM RESOURCE




108573982 nimblegen-human-buildhg19 (108563338) GRCh37-lite-build37 (106942997)
convert_chrXX_contigs_to_GL

The algorithm must be defined when the converter is created or else the converter is invalid. (See genome model reference-sequence converter create for handy documentation and a quick way to make converters.) Additionally, UR shouldn't let you save a converter without an algorithm (or with an invalid algorithm) thanks to errors."

malachig commented 10 years ago

If I do this test inside the standalone GMS

genome model reference-sequence converter list -f destination_reference_build_id=106942997

ID SOURCE_REFERENCE_BUILD DESTINATION_REFERENCE_BUILD ALGORITHM RESOURCE


108573982 nimblegen-human-buildhg19 (108563338) GRCh37-lite-build37 (106942997)

In other words, 'algorithm' is defined inside the TGI but is not defined in the standalone GMS. This is a problem with this module:

genome/lib/perl/Genome/Model/Command/Export/Metadata.pm

Metrics association with models and software results are not being obtained when grabbing the content to dump and then import into the standalone GMS for the demonstration analysis.

Refer to this code section:
    for my $ext (qw/Input Param/) {
        my $related_class = $base_class . "::$ext";
        if (UR::Object::Type->get($related_class)) {
            my $owner_method;
            my $value_method;
            my $value_method2;
            if ($obj->isa("Genome::Model")) {
                $owner_method = "model_id";
                $value_method = "value";
            }
            elsif ($obj->isa("Genome::Model::Build")) {
                $owner_method = "build_id";
                $value_method = "value";
            }  
            elsif ($obj->isa("Genome::SoftwareResult")) {
                $owner_method = "software_result_id";
                $value_method = "value_obj";
                $value_method2 = "value_id";
            }
            else {
                next;
            }
            my @assoc = $related_class->get($owner_method => $obj->id);
            for my $a (@assoc) {
                my $v = $a->$value_method;
                unless ($v) {
                    my $id = $a->$value_method2;
                    die if not defined $id;
                    $v = UR::Value::Text->get($id);
                }
                unless ($sanitize_map->{$v->id} and $sanitize_map->{$v->id} == $obj->id) {
                    $self->add_to_dump_queue($a, $queue, $exclude, $sanitize_map) unless $exclude->{$final_class};
                    $self->add_to_dump_queue($v, $queue, $exclude, $sanitize_map);
                }
            }
        }
    }
            
malachig commented 10 years ago

Fixing this might be as simple changing:

for my $ext (qw/Input Param/) {

to

for my $ext (qw/Input Param Metric/) {

Then of course we would have to regenerate the metadata dump, update this in the FTP staging dir, redo the import in the standalone GMS and test....

malachig commented 10 years ago

To test whether fixing this issue properly will actually get this build past this point we can manually patch the database to contain the missing algorithm like so:

perl -MGenome -e 'Genome::Model::Build::ReferenceSequence::Converter->get(108573982)->algorithm("convert_chrXX_contigs_to_GL"); UR::Context->commit();'

But we should really fix the metadata exporter and not do this.

malachig commented 10 years ago

Manually adding the algorithm to the database worked, but now I am getting a bunch of errors like this:

DBD::Pg::st execute failed: ERROR: insert or update on table "instance" violates foreign key constraint "instance_peer_instance_id_fkey"

malachig commented 10 years ago

I tried adding the line of code described above to Metadata.pm:

for my $ext (qw/Input Param Metric/) {

But when I try to regenerate the metadata I now get this error:

export Genome::SoftwareResult::Param (Genome::SoftwareResult::Param): variant_type:snvERROR: Can't locate object method "value_obj" via package "Genome::SoftwareResult::Metric" (perhaps you forgot to load "Genome::SoftwareResult::Metric"?) at /gscuser/mgriffit/git/genome/lib/perl/Genome/Model/Command/Export/Metadata.pm line 310

sakoht commented 10 years ago

Metrics are a little asymmetrical from inputs and params because they are don't have value_{class_name,id,obj}, just value. Probably needs a similar loop but just copying "value".

Sent from my iPhone

On Oct 28, 2013, at 3:45 PM, Malachi Griffith notifications@github.com wrote:

I tried adding the line of code described above to Metadata.pm:

for my $ext (qw/Input Param Metric/) {

But when I try to regenerate the metadata I now get this error:

export Genome::SoftwareResult::Param (Genome::SoftwareResult::Param): variant_type:snvERROR: Can't locate object method "value_obj" via package "Genome::SoftwareResult::Metric" (perhaps you forgot to load "Genome::SoftwareResult::Metric"?) at /gscuser/mgriffit/git/genome/lib/perl/Genome/Model/Command/Export/Metadata.pm line 310

— Reply to this email directly or view it on GitHub.

malachig commented 10 years ago

The latest test still has some issue with obtaining a FeatureList during the reference coverage step.

2013-11-23 08:52:12-0600 clia1: ERROR: Calling get on SoftwareResult (unless getting by id) is slow and possibly incorrect. 2013-11-23 08:52:16-0600 clia1: ERROR: Can't open file () to md5sum: No such file or directory at /opt/gms/4K8W670/sw/genome/lib/perl/Genome/FeatureList.pm line 211

I'm guessing this is still some problem with the metadata export/import process? If I try the following query inside TGI I get a file path among other values for the feature list is question. If I try the same query in the standalone GMS, all values look the same, except 'OUTPUT_DIR' is NULL

% genome feature-list list id=7BF768EF51FB11E1A0743039993C62A0

why? Are any other values related to the feature list also not being defined properly?

The actual data does seem to be present in the standalone GMS as part of the TST1 data mount here: /opt/gms/GMS1/fs/gc4095/info/feature_list/7BF768EF51FB11E1A0743039993C62A0/

Furthermore, inside the TGI:

% genome feature-list list id=7BF768EF51FB11E1A0743039993C62A0 --show disk_allocation

Gives: /gscmnt/gc4095/info/feature_list/7BF768EF51FB11E1A0743039993C62A0

But in the standalone GMS, this is NULL

malachig commented 10 years ago

I expect this latest error has something to do with this code in Genome/FeatureList.pm

81         file_path => {
82             is => 'Text',
83             calculate_from => 'disk_allocation',
84             calculate => q(
85                   if($disk_allocation) {
86                     my $directory = $disk_allocation->absolute_path;
87                        return join('/', $directory, $self->id . '.bed');
88                   }
89                ),
90         },

Perhaps the needed disk allocation is not being imported when the database is primed. If that is the case, then the problem likely lies in:

Genome/Model/Command/Export/Metadata.pm

gatoravi commented 10 years ago

I did an export from the TGI side and stored the results in 2891454740-2013.11.25 and did a genome model import metadata 2891454740-2013.11.25 on the blade16-4-16. Then I did a genome feature-list list and the output_dir is displayed currently. I'm going to rerun an exome ref align build and see if this fixes the issue we have.

gatoravi commented 10 years ago

ssmith@blade16-4-16 ~> genome feature-list list ID NAME SOURCE FORMAT CONTENT_TYPE REFERENCE OUTPUT_DIR


7BF768EF51FB11E1A0743039993C62A0 11111001 capture chip set nimblegen true-BED exome nimblegen-human-buildhg19 (108563338) /opt/gms/GMS1/fs/gc4095/info/feature_list/7BF768EF51FB11E1A0743039993C62A0

sakoht commented 10 years ago

The core tries to call file_path instead of output_dir. Does that work?

Sent from my iPhone

On Nov 25, 2013, at 9:03 PM, Avinash Ramu notifications@github.com wrote:

ssmith@blade16-4-16 ~> genome feature-list list ID NAME SOURCE FORMAT CONTENT_TYPE REFERENCE OUTPUT_DIR

7BF768EF51FB11E1A0743039993C62A0 11111001 capture chip set nimblegen true-BED exome nimblegen-human-buildhg19 (108563338) /opt/gms/GMS1/fs/gc4095/info/feature_list/7BF768EF51FB11E1A0743039993C62A0

— Reply to this email directly or view it on GitHub.

sakoht commented 10 years ago

Run the lister and show file_path and disk_allocation. It may be that one or some of those fail.

Sent from my iPhone

On Nov 25, 2013, at 3:54 PM, Malachi Griffith notifications@github.com wrote:

I expect this latest error has something to do with this code in Genome/FeatureList.pm

81 file_path => { 82 is => 'Text', 83 calculate_from => 'disk_allocation', 84 calculate => q( 85 if($disk_allocation) { 86 my $directory = $disk_allocation->absolute_path; 87 return join('/', $directory, $self->id . '.bed'); 88 } 89 ), 90 }, Perhaps the needed disk allocation is not being imported when the database is primed. If that is the case, then the problem likely lies in:

Genome/Model/Command/Export/Metadata.pm

— Reply to this email directly or view it on GitHub.

gatoravi commented 10 years ago

The ref align model succeeded !! Looks like just doing a fresh export and import worked, might this be due to the recent changes in table schemas ?

=== Build === Build ID: 36aafcd4565811e39a9f33a17bdf329a Build Status: Succeeded Model ID: 2891377978 Model Name: tst1-normal-refalign-exome Run by: ssmith Processing Profile ID: 2635769 Build Scheduled: 2013-11-25 23:04:27 Build Completed: 2013-11-26 03:06:31

malachig commented 10 years ago

While it is not entirely clear why redoing the export/import worked here, for now I have copied the data dump file described above '2891454740-2013.11.25.dat' into the staging dir here: /gscmnt/sata102/info/ftp-staging/pub/software/gms/testdata/GMS1/export

It will stage here: http://genome.wustl.edu/pub/software/gms/testdata/GMS1/export/

I also updated the README.md to specify use of this file. I will now update the database priming in the clia test box and try a new reference alignment there as well.

This was done by % make db-rebuild % genome model import metadata 2891454740-2013.11.25.dat

malachig commented 10 years ago

After importing the new meta data:

% genome feature-list list id=7BF768EF51FB11E1A0743039993C62A0 --show output_dir,disk_allocation

OUTPUT_DIR DISK_ALLOCATION


/opt/gms/GMS1/fs/gc4095/info/feature_list/7BF768EF51FB11E1A0743039993C62A0 /opt/gms/GMS1/fs/gc4095/info/feature_list/7BF768EF51FB11E1A0743039993C62A0

malachig commented 10 years ago

The new data dump seems to have added a disk allocation and various other things that were not in the old data dump. This seems to resolve the issue we were having above. BUT, at the same time we have lost meta-data related to the aligner indexes for bwa 0.5.9. This means that when reference alignment runs, the indexes have to be built from scratch... Kind of unfortunate since we go through the trouble of copying over the index files...

Why is the software result for aligner index no longer being exported. To see the differences between the old and the new meta-data dumps you can diff these files:

http://genome.wustl.edu/pub/software/gms/testdata/GMS1/export/2891454740-2013.11.1.dat http://genome.wustl.edu/pub/software/gms/testdata/GMS1/export/2891454740-2013.11.25.dat

How are these things different while the test case for genome model export metadata continues to pass?

malachig commented 10 years ago

To test for existence of software results related to indexing we should be able to do this: % ur list objects --subj Genome::SoftwareResult --show id,class

Or more specifically: % ur list objects --subj Genome::SoftwareResult --show id,class | grep Index

Now try grepping for SoftwareResult in the metadata .dat file: % cat 2891454740-2013.11.1.dat | grep Software | grep aligner

% cat 2891454740-2013.11.25.dat | grep Software | grep aligner

It appears that the Nov. 1 dat file had software results (Genome::SoftwareResult::Param) for 'bwa' and '0.5.9' but these were lost in the Nov. 25 dat file. Neither seems to make an mention of Bowtie related Software results. But they do both have ProcessingProfile params (Genome::ProcessingProfile::Param) related to Bowtie... Not sure of the significance of this...

If I try to regenerate the .dat file again from a fresh checkout, the 'aligner' results that were present Nov. 1 are still gone. Same thing if I run it from a stable branch.

Are these software results even in the database on the TGI side still? Yes. % ur list objects --subj Genome::SoftwareResult --show id,class --filter id=117803766

malachig commented 10 years ago

Both this issue and issue #30 are now stuck on an apparent inability to recognize software results that are imported into the sGMS from TGI by:

wget http://genome.wustl.edu/pub/software/gms/testdata/GMS1/export/2891454740-2013.11.25.dat
genome model import metadata 2891454740-2013.11.25.dat
genome sys gateway attach GMS1 --protocol ftp --rsync

For both rna-seq and reference-alignment pipelines we import software results for aligner specific reference genome indexes. i.e. An indexed version of the reference genome created using the aligner program and specific version that will be used for alignment of reads. For both pipelines, these software results are not recognized and the indexes are created from scratch the first time they are needed. These new indexes are stored as new software results and in subsequent steps, these software results are recognized and short-cutting works correctly from that point onward.

malachig commented 10 years ago

It looks like the problem is related to this code block in: Genome/Model/Command/Export/Metadata.pm

        if ($obj->isa("Genome::Model::Build::ReferenceSequence")) {
            my @i = Genome::Model::Build::ReferenceSequence::AlignerIndex->get(reference_build_id => $obj->id, test_name => undef);
            for my $i (@i) {
                my $dir = $i->output_dir;
                next if $dir and $dir =~ /gscarchive/;
                next unless $i->id eq '117803766';        # TODO: make this smarter
                $self->add_to_dump_queue($i, $queue, $exclude, $sanitize_map);
            }
            my @prev_builds = grep { $_->isa("Genome::Model::Build::ReferenceSequence") } values %{ $queue->{"Genome::Model::Build"} };
            if (@prev_builds) {
                $DB::single = 1;
                my @converters1 = map { Genome::Model::Build::ReferenceSequence::Converter->get(source_reference_build => $obj, destination_reference_build => $_) } @prev_builds;
                my @converters2 = map { Genome::Model::Build::ReferenceSequence::Converter->get(destination_reference_build => $obj, source_reference_build => $_) } @prev_builds;
                for my $converter (@converters1, @converters2) {
                    $self->add_to_dump_queue($converter, $queue, $exclude, $sanitize_map);
                }
            }
        }

Try this:

perl -e 'use Genome; $aligner_index = Genome::Model::Build::ReferenceSequence::AlignerIndex->get(117803766); $test_name = $aligner_index->test_name; print "\n\n$aligner_index\n$test_name\n\n"'

Genome::Model::Build::ReferenceSequence::AlignerIndex=HASH(0xba50bb0)
rt 96275

It looks like the aligner index we are trying to retrieve has a test name set and that we skip such objects... Even if we fix this it seems that we still would not get the tophat aligner index.

malachig commented 10 years ago

Note from @acoffman :+1:

When I set the test name for the bad software results for the ticket (RT 96275), I must have inadvertently grabbed all the results associated with the build (including the aligner index). There is nothing wrong with the index as far as I know. Unfortunately, I can't remove the test name at this point because then there will be two identical aligner index software results and this will break shortcutting.

It looks like the identical one that was created subsequently has the id: ec26139c40dc4d9ea24dc9fa55160b71

malachig commented 10 years ago

Obtaining information about ReferenceSequence::AlignerIndex software results:

Get all results for a particular reference sequence build and display basic results

perl -e 'use Genome; my @r = Genome::Model::Build::ReferenceSequence::AlignerIndex->get(reference_build_id=>106942997); foreach my $r (@r){$a_id=$r->id; $a_name=$r->aligner_name; $a_version=$r->aligner_version; $a_params=$r->aligner_params; $test_name=$r->test_name; print "$a_id\t$a_name\t$a_version\t$a_params\t$test_name\n"}';

Narrow down to a specific aligner and version of that aligner:

perl -e 'use Genome; my @r = Genome::Model::Build::ReferenceSequence::AlignerIndex->get(aligner_name=>"bwa", reference_build_id=>106942997, aligner_version=>"0.5.9"); foreach my $r (@r){$a_id=$r->id; $a_name=$r->aligner_name; $a_version=$r->aligner_version; $a_params=$r->aligner_params; $test_name=$r->test_name; print "$a_id\t$a_name\t$a_version\t$a_params\t$test_name\n"}';

Now remove any with a 'test_name' defined and display along with actual dir for: 'bwa', '0.5.9', reference_build=106942997

perl -e 'use Genome; my @r = Genome::Model::Build::ReferenceSequence::AlignerIndex->get(aligner_name=>"bwa", reference_build_id=>106942997, aligner_version=>"0.5.9"); foreach my $r (@r){$a_id=$r->id; $a_name=$r->aligner_name; $a_version=$r->aligner_version; $a_params=$r->aligner_params; $test_name=$r->test_name; $a_dir=$r->output_dir; unless ($test_name){print "$a_id\t$a_name\t$a_version\t$a_params\t$a_dir\n"}}';

Now do the same thing but for: 'bowtie', '2.0.0-beta7', reference_build=106942997

perl -e 'use Genome; my @r = Genome::Model::Build::ReferenceSequence::AlignerIndex->get(aligner_name=>"bowtie", reference_build_id=>106942997, aligner_version=>"2.0.0-beta7"); foreach my $r (@r){$a_id=$r->id; $a_name=$r->aligner_name; $a_version=$r->aligner_version; $a_params=$r->aligner_params; $test_name=$r->test_name; $a_dir=$r->output_dir; unless ($test_name){print "$a_id\t$a_name\t$a_version\t$a_params\t$a_dir\n"}}';
gatoravi commented 10 years ago

The Index that is currently staged is here(the staging directory)

/gscmnt/sata102/info/ftp-staging/pub/software/gms/testdata/GMS1/fs/gc4095/info/model_data/ref_build_aligner_index_data/2869585698/build106942997/aligner-index-blade13-4-7.gsc.wustl.edu-wschierd-21466-117803766/bwa/0_5_9/

This index has a test_name that is not 'undef', hence we have decided to remove these index files(these old indexes have been removed from the staging directory) and import a newer aligner index result(id = ec26139c40dc4d9ea24dc9fa55160b71),

this will go here, /gscmnt/sata102/info/ftp-staging/pub/software/gms/testdata/GMS1/fs/gc13011/info/model_data/ref_build_aligner_index_data/2869585698/build106942997/aligner-index-blade9-2-12.gsc.wustl.edu-acoffman-16169-ec26139c40dc4d9ea24dc9fa55160b71/bwa/0_5_9

Index results for bowtie(id = 127215980 ) are now going to be staged as well. These will go here, /gscmnt/sata102/info/ftp-staging/pub/software/gms/testdata/GMS1/fs/gc4096/info/model_data/ref_build_aligner_index_data/2869585698/build106942997/aligner-index-blade14-4-6.gsc.wustl.edu-jwalker-4387-127215980/bowtie/2_0_0_beta7

BWA succesfully shortcuts the indexing step for the ref align models.

gatoravi commented 10 years ago

In the 'first phase of testing', bwa shortcuts but the 'per-lane-tophat index #1' step does not shortcut. This now takes 30 minutes as compared to 4 hours earlier(presumably due to bowtie indexes being present now) but does not shortcut. Looks like another aligner index result needs to be exported (this can be seen by running Genome::Model::Build::ReferenceSequence::AlignerIndex->get(); on the standalone)

aligner_name = per-lane-tophat , aligner_version = 2.0.4, aligner_params = "-p 4 --bowtie-version=2.0.0-beta7"

perl -e 'use Genome; my @r = Genome::Model::Build::ReferenceSequence::AlignerIndex->get(aligner_name=>"per-lane-tophat", reference_build_id=>106942997, aligner_version=>"2.0.4", aligner_params=>"-p 4 --bowtie-version=2.0.0-beta7"); foreach my $r (@r){$a_id=$r->id; $a_name=$r->aligner_name; $a_version=$r->aligner_version; $a_params=$r->aligner_params; $test_name=$r->test_name; $a_dir=$r->output_dir; unless ($test_name){print "$a_id\t$a_name\t$a_version\t$a_params\t$a_dir\n"}}';

These indexes were copied to

/gscmnt/sata102/info/ftp-staging/pub/software/gms/testdata/GMS1/fs/gc4096/info/model_data/ref_build_aligner_index_data/2869585698/build106942997/aligner-index-blade14-4-6.gsc.wustl.edu-jwalker-4387-127215977/per_lane_tophat/2_0_4/_p_4___bowtie_version_2_0_0_beta7

These indexes have now been staged over to the FTP site.

gatoravi commented 10 years ago

In addition to the Genome::Model::Build::ReferenceSequence::AlignerIndex, RNAseq also needs objects of kind Genome::Model::Build::ReferenceSequence::AnnotationIndex and their software results.

perl -e 'use Genome; my @r = Genome::Model::Build::ReferenceSequence::AnnotationIndex->get(annotation_build_id => 124434505, aligner_name=>"per-lane-tophat", reference_build_id=>106942997, aligner_version=>"2.0.4", aligner_params=>"-p 4 --bowtie-version=2.0.0-beta7"); foreach my $r (@r){$a_id=$r->id; $a_name=$r->aligner_name; $a_version=$r->aligner_version; $a_params=$r->aligner_params; $test_name=$r->test_name; $a_dir=$r->output_dir; unless ($test_name){print "$a_id\t$a_name\t$a_version\t$a_params\t$a_dir\n"}}';

These files have been copied to /gscmnt/sata102/info/ftp-staging/pub/software/gms/testdata/GMS1/fs/gc4095/info/model_data/annotation_build_aligner_index_data/2869585698/reference_build106942997/annotation_build124434505/annotation-index-blade14-4-6.gsc.wustl.edu-jwalker-4387-127219304/per_lane_tophat/2_0_4/_p_4___bowtie_version_2_0_0_beta7

These files have been staged. The RNAseq per-lane-tophat index step should now successfully shortcut.

sakoht commented 10 years ago

That will speed things along.

Such a dilemma, having peop download something they can rebuild.

On Jan 16, 2014, at 2:49 PM, Avinash Ramu notifications@github.com wrote:

In addition to the Genome::Model::Build::ReferenceSequence::AlignerIndex, RNAseq also needs objects of kind Genome::Model::Build::ReferenceSequence::AnnotationIndex and their software results.

perl -e 'use Genome; my @r = Genome::Model::Build::ReferenceSequence::AnnotationIndex->get(annotation_build_id => 124434505, aligner_name=>"per-lane-tophat", reference_build_id=>106942997, aligner_version=>"2.0.4", aligner_params=>"-p 4 --bowtie-version=2.0.0-beta7"); foreach my $r (@r){$a_id=$r->id; $a_name=$r->aligner_name; $a_version=$r->aligner_version; $a_params=$r->aligner_params; $test_name=$r->test_name; $a_dir=$r->output_dir; unless ($test_name){print "$a_id\t$a_name\t$a_version\t$a_params\t$a_dir\n"}}';

These files have been copied to /gscmnt/sata102/info/ftp-staging/pub/software/gms/testdata/GMS1/fs/gc4095/info/model_data/annotation_build_aligner_index_data/2869585698/reference_build106942997/annotation_build124434505/annotation-index-blade14-4-6.gsc.wustl.edu-jwalker-4387-127219304/per_lane_tophat/2_0_4/p4___bowtie_version_2_0_0_beta7

These files have been staged. The RNAseq per-lane-tophat index step should now successfully shortcut.

— Reply to this email directly or view it on GitHub.

sakoht commented 10 years ago

If they are both verified to be identical, a test name could be added to the new one, and removed from the one in the same transaction. This would be "safe" from causing crashes or race conditions.

On Jan 10, 2014, at 3:19 PM, Malachi Griffith notifications@github.com wrote:

Note from @acoffman

When I set the test name for the bad software results for the ticket (RT 96275), I must have inadvertently grabbed all the results associated with the build (including the aligner index). There is nothing wrong with the index as far as I know. Unfortunately, I can't remove the test name at this point because then there will be two identical aligner index software results and this will break shortcutting.

It looks like the identical one that was created subsequently has the id: ec26139c40dc4d9ea24dc9fa55160b71

— Reply to this email directly or view it on GitHub.

obigriffith commented 10 years ago

Just trying a fresh install and run through on the box at home. First try at running exome refalign and it does not look like I am getting a shortcut on building the reference index. Update: also still not getting shortcut on RNAseq indexing.

sakoht commented 10 years ago

Did we resolve that test_name issue?

Sent from my iPhone

On Jan 19, 2014, at 1:10 AM, Obi Griffith notifications@github.com wrote:

Just trying a fresh install and run through on the box at home. First try at running exome refalign and it does not look like I am getting a shortcut on building the reference index.

— Reply to this email directly or view it on GitHub.

sakoht commented 10 years ago

You can list with: ur list objects --subj Genome::SoftwareResult

Sent from my iPhone

On Jan 19, 2014, at 1:10 AM, Obi Griffith notifications@github.com wrote:

Just trying a fresh install and run through on the box at home. First try at running exome refalign and it does not look like I am getting a shortcut on building the reference index.

— Reply to this email directly or view it on GitHub.

obigriffith commented 10 years ago

Unfortunately, the last command above errors out.

ogriffit@GGMS ~/gms (ubuntu-12.04)> ur list objects --subj Genome::SoftwareResult

ERROR: Can't call method "display_name" on an undefined value at /opt/gms/AUIS907/sw/genome/lib/perl/Genome/SoftwareResult/Param.pm line 68.

gatoravi commented 10 years ago

Both the ref-align-exome and rna-seq steps shortcut on a completely fresh install and fresh download of data on a blade. This step not working for Obi might have something to do with a standalone install outside TGI or an r-sync issue on Obi's box(I'm leaning towards this).

sakoht commented 10 years ago

Try --show id,class.

Not sure what the deal with display_name is. It should be display_name. Weird.

On Tuesday, January 21, 2014, Obi Griffith notifications@github.com wrote:

Unfortunately, the last command above errors out.

ogriffit@GGMS ~/gms (ubuntu-12.04)> ur list objects --subj Genome::SoftwareResult

ERROR: Can't call method "_displayname" on an undefined value at /opt/gms/AUIS907/sw/genome/lib/perl/Genome/SoftwareResult/Param.pm line 68.

— Reply to this email directly or view it on GitHubhttps://github.com/genome/gms/issues/15#issuecomment-32979344 .

Sent from Gmail Mobile

obigriffith commented 10 years ago

This is now working on external standalone. Both RNAseq and RefAlign indexing is being shortcut successfully. Problem last time through for me was not having latest version of meta-data file. This issue seems to be resolved.