genome / gms

The Genome Modeling System installer
https://github.com/genome/gms/wiki
GNU Lesser General Public License v3.0
78 stars 22 forks source link

ERROR: insert or update on table "instance" #181

Closed gatoravi closed 9 years ago

gatoravi commented 9 years ago

The Pindel step in the somatic-variation workflow dies with this error,

    451 2015-03-16 17:29:40+0000 ip-172-31-19-8: DBD::Pg::st execute failed: ERROR:  insert or update on table "instance" violates foreign key constraint "instance_peer_instance_id_fkey"
    452 2015-03-16 17:29:40+0000 ip-172-31-19-8: DETAIL:  Key (peer_instance_id)=(cb0287f34d074a78877863ccb749dedc) is not present in table "instance". at /opt/gms/RGTBV26/sw/ur/lib/UR/DBI.§
    453 2015-03-16 17:29:40+0000 ip-172-31-19-8: 2015/03/16 17:29:40 Workflow::DataSource::InstanceSchema id(Workflow::DataSource::InstanceSchema): Workflow::DataSource::InstanceSchema: Err§
    454 2015-03-16 17:29:40+0000 ip-172-31-19-8:  INSERT INTO workflow.instance (workflow_plan_id,workflow_instance_id,peer_instance_id,parent_instance_id,parent_execution_id,parallel_index§
    455 2015-03-16 17:29:40+0000 ip-172-31-19-8: PARAMS: 'e09177aaa6784d5884ea498e4d1a6558', 'fe9cf90e320d44048342597d075a7b75', 'cb0287f34d074a78877863ccb749dedc', (undef), (undef), '14', §
    456 2015-03-16 17:29:40+0000 ip-172-31-19-8:  024c6f37ca0f44d89ea714ec3a515c65^@^@^@^Rprevious_result_id^D^C^@^@^@^B^D^Q&Genome::Model::Build::SomaticVariation^C^@^@^@^P

This is a violation of the constraint - "instance_peer_instance_id_fkey".

Note - This constraint is absent in the latest schema dump, this might just be due to TGI having moved on from workflow to flow?

gatoravi commented 9 years ago

@brummett , any idea why we might be seeing this again ? Last time you helped us switch the case of the table names to resolve this. We are currently testing using this branch.

brummett commented 9 years ago

We're still using Workflow for production work, and our Workflow tables are still in Oracle, so that's one difference.

How did you construct your Workflow schema? It's possible there's a constraint in the Oracle schema that isn't present in your Postgres schema or in Workflow's metaDB.

brummett commented 9 years ago

I noticed that your MetaDB is out of whack a bit. The constraint names still refer to the Oracle names. For example, WI_PEID_FK should be instance_peer_instance_id_fkey.

Also, UR doesn't use the owner column in the MetaDB any more. All places that refer to owner workflow and table workflow_instance should be changed to have owner by an empty string and table workflow.workflow_instance. Same for all the other places table names appear.

Have you just now updated to use a more recent UR? The table name change went out last summer.

gatoravi commented 9 years ago

Our workflow schemas are re-constructed with the help of this schema-dump from TGI. However this schema dump is not identical to the dump that would be produced at TGI today if you ran sudo -u postgres pg_dump -s genome We've been minimally adding tables over as and when things break on the SGMS side, this is quite fragile.

Yes, we just updated to a new version of UR, it is this one. Should we just create a new MetaDB ?

As you know the one we use right now is in this Is there a way to regenerate the Meta dump, I can then commit this back to that branch.

brummett commented 9 years ago

Are your workflow tables in the same database as all the other Genome tables? If so, then you'll probably want to edit Workflow's MetaDB to minimize the damage the automated update tool might do.

You can update the MetaDB with the automated tool like this: Edit the existing file lib/Workflow/DataSource/Meta.sqlite3-dump. Remove all the INSERT INTO lines, leaving the CREATE TABLE lines in place. Change your directory to to lib/Workflow. Run the command ur update classes-from-db. This will inspect the database and write a new MetaDB database and dump, and might also rewrite some of the Perl module class definitions. Look over the changes with git diff and make sure they look reasonable. It shouldn't have deleted any files, or made large, sweeping changes to any of the Perl modules. The MetaDB dump should now have INSERT statements to populate the metadata about tables, columns and foreign keys.

gatoravi commented 9 years ago

Running the ur update classes-from-db errors out,

ubuntu@ip-172-31-19-86 /opt/gms/RGTBV26/sw/workflow/lib/Workflow (gms-pub-2015.02.22)> ur update classes-from-db
Updating namespace: Workflow
Found data sources: InstanceSchema
Checking Workflow::DataSource::InstanceSchema for schema changes ...
WARNING: Dump file is newer than the db file.  Replacing db_file /opt/gms/RGTBV26/sw/workflow/lib/Workflow/DataSource/Meta.sqlite3.
WARNING: Re-creating /opt/gms/RGTBV26/sw/workflow/lib/Workflow/DataSource/Meta.sqlite3 from /opt/gms/RGTBV26/sw/workflow/lib/Workflow/DataSource/Meta.sqlite3-dump.
A  InstanceSchema instance_execution          Schema changes
A  InstanceSchema plan                        Schema changes
A  InstanceSchema execution_metric            Schema changes
A  InstanceSchema instance                    Schema changes
A  InstanceSchema service                     Schema changes
Found 5 tables with changes.
Resolving corresponding class changes...
Updating classes...
A Workflow::ExecutionMetric                class uses instanceschema table execution_metric
A Workflow::Instance                       class uses instanceschema table instance
A Workflow::InstanceExecution              class uses instanceschema table instance_execution
A Workflow::Plan                           class uses instanceschema table plan
ERROR: Class Workflow::Service already exists for table 'workflow.service' in Workflow::DataSource::InstanceSchema.  Cannot generate class for service in Workflow::DataSource::InstanceSchema. at /opt/gms/RGTBV26/sw/ur/lib/UR/Namespace/Command/Update/ClassesFromDb.pm line 814
        UR::Namespace::Command::Update::ClassesFromDb::_update_class_metadata_objects_to_match_database_metadata_changes('UR::Namespace::Command::Update::ClassesFromDb=HASH(0x3613560)', 'data_dictionary_objects', 'ARRAY(0x353e680)') called at /opt/gms/RGTBV26/sw/ur/lib/UR/Namespace/Command/Update/ClassesFromDb.pm line 259
        UR::Namespace::Command::Update::ClassesFromDb::execute('UR::Namespace::Command::Update::ClassesFromDb=HASH(0x3613560)', 'HASH(0x360cf98)') called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 135
        eval {...} called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 135
        Command::V1::execute('UR::Namespace::Command::Update::ClassesFromDb=HASH(0x3613560)', 'HASH(0x360cf98)') called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 283
        Command::V1::_execute_delegate_class_with_params('UR::Namespace::Command', 'UR::Namespace::Command::Update::ClassesFromDb', 'HASH(0x360cf98)', '/opt/gms/RGTBV26/sw/ur/bin/ur\x{0}update\x{0}classes-from-db') called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 232
        Command::V1::_execute_with_shell_params_and_return_exit_code('UR::Namespace::Command', 'update', 'classes-from-db') called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 204
        eval {...} called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 203
        Command::V1::execute_with_shell_params_and_exit('UR::Namespace::Command') called at /opt/gms/RGTBV26/sw/ur/bin/ur line 19
brummett commented 9 years ago

Then it looks like the MetaDB dump needs to be edited by hand.

One simple thing you can try is to edit lib/Workflow/DataSource/Meta.sqlite3-dump and remove all the INSERT statements again. This will force UR to ask the DB what the schema looks like when it's time to save. It should discover all the current FK constraints and save in the proper order.

If that works and you want to be bothered to do it, we can edit the MetaDB dump to fix the table and constraint names.

gatoravi commented 9 years ago

Sorry I should have mentioned, the previous error was after removing all the INSERT statements in lib/Workflow/DataSource/Meta.sqlite3-dump. Actually I get the same error whether I remove the INSERT statements or not.

ubuntu@ip-172-31-19-86 /opt/gms/RGTBV26/sw/workflow/lib/Workflow (gms-pub-2015.02.22)> ur update classes-from-db
Updating namespace: Workflow
Found data sources: InstanceSchema
Checking Workflow::DataSource::InstanceSchema for schema changes ...
A  InstanceSchema instance_execution          Schema changes
A  InstanceSchema plan                        Schema changes
A  InstanceSchema execution_metric            Schema changes
A  InstanceSchema instance                    Schema changes
A  InstanceSchema service                     Schema changes
Found 5 tables with changes.
Resolving corresponding class changes...
Updating classes...
A Workflow::ExecutionMetric                class uses instanceschema table execution_metric
A Workflow::Instance                       class uses instanceschema table instance
A Workflow::InstanceExecution              class uses instanceschema table instance_execution
A Workflow::Plan                           class uses instanceschema table plan
ERROR: Class Workflow::Service already exists for table 'workflow.service' in Workflow::DataSource::InstanceSchema.  Cannot generate class for service in Workflow::DataSource::InstanceSchema. at /opt/gms/RGTBV26/sw/ur/lib/UR/Namespace/Command/Update/ClassesFromDb.pm line 814
        UR::Namespace::Command::Update::ClassesFromDb::_update_class_metadata_objects_to_match_database_metadata_changes('UR::Namespace::Command::Update::ClassesFromDb=HASH(0x2ce64e0)', 'data_dictionary_objects', 'ARRAY(0x2c11510)') called at /opt/gms/RGTBV26/sw/ur/lib/UR/Namespace/Command/Update/ClassesFromDb.pm line 259
        UR::Namespace::Command::Update::ClassesFromDb::execute('UR::Namespace::Command::Update::ClassesFromDb=HASH(0x2ce64e0)', 'HASH(0x2cdfeb8)') called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 135
        eval {...} called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 135
        Command::V1::execute('UR::Namespace::Command::Update::ClassesFromDb=HASH(0x2ce64e0)', 'HASH(0x2cdfeb8)') called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 283
        Command::V1::_execute_delegate_class_with_params('UR::Namespace::Command', 'UR::Namespace::Command::Update::ClassesFromDb', 'HASH(0x2cdfeb8)', '/opt/gms/RGTBV26/sw/ur/bin/ur\x{0}update\x{0}classes-from-db') called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 232
        Command::V1::_execute_with_shell_params_and_return_exit_code('UR::Namespace::Command', 'update', 'classes-from-db') called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 204
        eval {...} called at /opt/gms/RGTBV26/sw/ur/lib/Command/V1.pm line 203
        Command::V1::execute_with_shell_params_and_exit('UR::Namespace::Command') called at /opt/gms/RGTBV26/sw/ur/bin/ur line 19

Any idea what Class Workflow::Service already exists for table 'workflow.service' means here? Note - I'm doing this on the SGMS instance and not inside TGI.

brummett commented 9 years ago

Sorry, I wasn't clear... I meant to remove all the INSERT statements from the MetaDB dump, then try re-running the build (or whatever generated the original FK constraint error).

The error Class Workflow::Service already exists for table 'workflow.service' comes up because of what ur update classes-from-db is doing. Since all the MetaDB info is wiped out by deleting the INSERT statement lines, it thinks every table in the schema is brand new. For every new table, it tries to create a class based on the table name. The table workflow.service becomes the class/module Workflow::Service, which already exists and triggers the error.

gatoravi commented 9 years ago

ok, thanks, that helps a lot :) this is completely uncharted territory for me :game_die:

gatoravi commented 9 years ago

removing all the INSERT statements from lib/Workflow/DataSource/Meta.sqlite3-dump and re-running the builds still results in the same FK constraint error.

brummett commented 9 years ago

I'm out of simple fixes.

You could try reverting your UR checkout to e78f78f23d, the commit before this MetaDB change was merged in, and re-running the build. That could help verify that the problem is what I think it is. But the build could also fail if it requires any UR features added since then.

You could try editing the MetaDB dump by hand. Change all the "owner" columns to empty strings and "table_name" to include the schema, like schema.table_name. Change the "owner" property of Workflow::DataSource::InstanceSchema's class definition to look like the "owner" property of Genome::DataSource::GMSchema. Correct all the constraint names (PK and FK) in the MetaDB, too (for example, WI_PEID_FK should be instance_peer_instance_id_fkey). This needs to be done if the first thing succeeds, anyway.

You could try stepping though the dependency-resolution code in UR::DataSource::RDBMS::_sync_database() and figure out why it's inserting things in the wrong order.

gatoravi commented 9 years ago

Some helpful info from Tony to help us debug this, The short answer is that the MetaDB is used at commit time to deduce what the database schema looks like so it can construct the SQL needed to save all your changes. When UR needs to load objects from the MetaDB, it looks at the date of the sqlite3 database file and the dump file. If the dump is newer, then it repopulates the database file from the dump.

The class definition tells UR what database table to save to using the table_name attribute. When it's time to save, UR looks up a UR::DataSource::RDBMS::Table object (backed by the dd_table table in the MetaDB) that goes with that class's table. If it finds one, then it believes the MetaDB is the truth about that class's table, columns and constraints.

If the Table object is not found, then UR asks the database what that table's columns and constraints look like, then creates a UR::DataSource::RDBMS::Table object (and associated Column and Constraint objects) that exist only during the life of the program - they're not saved to the MetaDB.

UR then uses these MetaDB objects to determine what columns the table has in order to write the SQL for saving. It uses the FK constraint objects to determine the order to save to to satisfy the constraints.

What I think is going on is there's a constraint in the DB to ensure the peer_instance_id column in the workflow_instance table also appears in that same table with the same workflow_instance_id. There's a record for the workflow_instance table in the MetaDB, so UR believes the MetaDB has the complete truth about the workflow_instance table.

When it looks up the FK constraints in the MetaDB, it's not finding all the info it needs to satisfy the constraint in the DB; probably because it's looking up table_name "workflow.workflow_instance" when it's in the metaDB as "workflow_instance". You could verify this by running it with the env var UR_DBI_MONITOR_SQL=1 and look for a SELECT statement looking up something from the dd_table table filtered by table_name.

One thing that doesn't make sense is that if it could find the Table MetaDB object, why can't it also find the FKConstraint objects?

Is the dump file writeable by the builds as well?

The MetaDB is just another database, like all the other databases UR can talk to. In practice, changes to the MetaDB objects only happens with you run "ur update classes-from-db". When saving objects from other namespaces like Genome, it only loads or defines MetaDB objects, never changing them or creating them, so the MetaDB (and its dump) don't change.

gatoravi commented 9 years ago

more info: There are components to do each of these two things:

brummett commented 9 years ago

That second bullet point is backwards. If the $temp flag is true, it calls __define__() on MetaDB objects and they are not persisted at commit time.

If $temp is false, it uses create() on MetaDB objects, they are saved to the MetaDB at commit time, and it issues CREATE and/or ALTER TABLE statements to the real database.

sakoht commented 9 years ago

Hey @brummett, I was just syncing up with @malachig and @obigriffith on this. Wanted to jump in and help if possible.

Just wanted to check on a few things to make sure my memory is correct:

  1. It seemed like maybe we were going the wrong direction with the "ur update classes" initially, right? We have class definitions that match the main line, and we just need to make a db that functions.
  2. As I recall, everything in the MetaDB is a cache can get re-generated, right? The classes have a structure, and defaults, and you can override the table/column names, but if there is no data in the meta-db, it starts with the tables/columns in the class def (specified directly or inferred), and queries the db for metadata about everything.
  3. We had a metadb on the old branch of workflow that worked. The only thing that should need to change is the schema changes that have happened as a result of class changes between the two time points.

Does that all sound correct?

sakoht commented 9 years ago

All of the above aside: the error is coming from DBD:Pg. This means there is an actual constraint in the db that is being violated. Not an issue with the metadata?

The constraint was either created by the db init script for the sGMS, or made dynamically because of the class structure at some point.

We could:

  1. Find that constraint and drop it at the db level, and see if the build finishes.
  2. See if that constraint is explicitly in the db init script.
  3. Ensure that the class def doesn't have any sort of required relationship that could cause the cosntraint to be made by some dynamic DML logic.

Thoughts?

gatoravi commented 9 years ago

Hi Scott, the constraint is in the schema dump here - https://github.com/genome/gms/blob/ubuntu-12.04/setup/schema.psql#L3207 I'm not sure I understand your point number 3, could you elaborate? Also, is there a way to see if this constraint is still being used inside TGI or if its SGMS specific?

sakoht commented 9 years ago

To test, connect to the database and run "drop constraint $name", and try to re-run the build.

If that works, delete the constraint from that file so it isn't re-created.

3 you can ignore if all of this works. If the constraint re-appears, the property tied to the constraint should be set to 'is_optional => 1'.

On Mar 29, 2015, at 10:03 PM, Avinash Ramu notifications@github.com wrote:

Hi Scott, the constraint is in the schema dump here - https://github.com/genome/gms/blob/ubuntu-12.04/setup/schema.psql#L3207 I'm not sure I understand your point number 3, could you elaborate? Also, is there a way to see if this constraint is still being used inside TGI or if its SGMS specific?

— Reply to this email directly or view it on GitHub.

brummett commented 9 years ago

On Sun, Mar 29, 2015 at 8:19 PM, Scott notifications@github.com wrote:

Hey @brummett https://github.com/brummett, I was just syncing up with @malachig https://github.com/malachig and @obigriffith https://github.com/obigriffith on this. Wanted to jump in and help if possible.

Just wanted to check on a few things to make sure my memory is correct:

1.

It seemed like maybe we were going the wrong direction with the "ur update classes" initially, right? We have class definitions that match the main line, and we just need to make a db that functions.

I still think there's a constraint in the DB that's not getting found in the MetaDB. Maybe because the constraint name is different, maybe because UR is looking up MetaDB entities that contain table_name like 'schema.table_name' (since UR commit b6543cfd). When it's saving, since the constraint in the MetaDB is missing, UR is doing table inserts in the wrong order (there's a 50% chance), inserting into the FK table first instead of the PK table.

1.

As I recall, everything in the MetaDB is a cache can get re-generated, right? The classes have a structure, and defaults, and you can override the table/column names, but if there is no data in the meta-db, it starts with the tables/columns in the class def (specified directly or inferred), and queries the db for metadata about everything.

When doing selects, UR infers the table structure from the class definition. When saving (insert, update, delete), it uses the MetaDB info, either finding a UR::DataSource::RDBMS::Table object matching the class's table_name (and then trusting everything in the MetaDB about that table), or introspecting the database and defineing MetaDB entities just for the life of the program.

1.

We had a metadb on the old branch of workflow that worked. The only thing that should need to change is the schema changes that have happened as a result of class changes between the two time points.

The only relevant UR change between then and now is the branch merged in commit b6543cfd where UR no longer uses the 'owner' column in the MetaDB any more. If I remember right, if the data source definition has a value for 'owner' (like Workflow::DataSource::InstanceSchema), then UR expects to find table names in the MetaDB that are just plain table names. If the data source's 'owner' is undef (like Genome::DataSource::GMSchema), then UR expects MetaDB table_names to be of the form "schema.table_name", not just for MetaDB Table objects, but for all types of MetaDB entities.

I'm out until Wednesday. If you don't have it figured out by then, I can try stepping through UR's sync_database and see what's going on.

-- Tony

gatoravi commented 9 years ago

I tried two things,

  1. As per Tony's suggestion I updated a table name in the metadata dump - 'workflow_service' to 'service' and changed the constraint 'WI_PEID_FK' to 'instance_peer_instance_id_fkey'
  2. A pindel sub-step was crashing due to a lock being created on /, this is related to https://github.com/genome/gms/issues/162 An attempt to fix this is on https://github.com/genome/genome/pull/602 Both these changes seem to get us past the step that was crashing, testing is underway to see if a new meta-data dump can be generated from the database instead. Once this works the commit in the PR needs to be cherry-picked into master.
sakoht commented 9 years ago

Nice.

On Mar 30, 2015, at 4:56 PM, Avinash Ramu notifications@github.com wrote:

I tried two things,

  1. As per Tony's suggestion I updated a table name in the metadata dump - 'workflow_service' to 'service' and changed the constraint 'WI_PEID_FK' to 'instance_peer_instance_id_fkey'
  2. A pindel sub-step was crashing due to a lock being created on /, an attempt to fix this is on genome/genome#602 Both these changes seem to get us past the step that was crashing, testing is underway to see if a new meta-data dump can be generated from the database instead.

— Reply to this email directly or view it on GitHub.

gatoravi commented 9 years ago

This might be related to https://github.com/genome/gms/issues/189 but closing this for now since the build succeeds. The PR was merged in here - https://github.com/genome/genome/pull/602 Master seems to handle this differently with separate environment variables for each lock. We could choose to go that way in the future.