CRI-iAtlas / iatlas-data

MOVED TO GITLAB -> https://gitlab.com/cri-iatlas/iatlas-data.git
1 stars 0 forks source link

Errors in first time building the db #93

Closed heimannch closed 4 years ago

heimannch commented 4 years ago

I was able to build the database in my machine after 3 rounds of calling iatlas.data::build_all() and resume(), in the same session and with no edits in settings or code. In each call and round I got a different error message.

I will share only the pipeline step where the error occurred.

R version 3.6.2 (2019-12-12) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.6

> iatlas.data::build_all()
--------------------------------------------------------------------------------
START: build_tags_tables (pipeline step 3/19)
Importing feather files for tags.
Imported feather files for tags.
Ensuring tags have all the correct columns and no dupes.
Resolving partial-duplicates (0 records)...
  finding partial-duplicates
: 0.177 sec elapsed
  found 0 duplicate records
  0 resulting records
Resolved partial-duplicates (0 records): 0.177 sec elapsed
Ensured tags have all the correct columns and no dupes.
Building tags table.
DONE:  dbExecute: 
      CREATE TABLE tags (
        id SERIAL,
        "name" VARCHAR NOT NULL,
        characteristics VARCHAR,
        display VARCHAR,
        color VARCHAR,
        PRIMARY KEY (id)
      );: 0.025 sec elapsed
DONE:  dbWriteTable: tags (0 rows): 0.013 sec elapsed
DONE:  dbExecute: CREATE UNIQUE INDEX tag_name_index ON tags ("name");: 0.013 sec elapsed
Built tags table. ( 0 rows )
Importing feather files for tags_to_tags.
build_tags_tables failed, but don't fret, you can resume from here:
OPTION: To resume from the last failure automatically: resume()
NOTEs:
  * If you change code, you can run source('./.RProfile') and then use one of the resume-options above.
  * The error's stack trace is available at: pipeline_stack_trace
Error: `by` can't contain join column `tag` which is missing from LHS

In a second attempt, in the same session, running iatlas.data::build_all() and resume() resulted in new errors messages:

> iatlas.data::build_all()
--------------------------------------------------------------------------------
START: build_tags_tables (pipeline step 3/19)
Importing feather files for tags.
Imported feather files for tags.
Ensuring tags have all the correct columns and no dupes.
Resolving partial-duplicates (162 records)...
  finding partial-duplicates
: 0.207 sec elapsed
  found 10 duplicate records
  flattening partial-duplicates
: 0.003 sec elapsed
  5 de-duplicated records
  removing old partial-duplicates: 0.001 sec elapsed
  152 original records where not duplicated
  157 resulting records
Resolved partial-duplicates (162 records): 0.212 sec elapsed
Ensured tags have all the correct columns and no dupes.
Building tags table.
DONE:  dbExecute: DROP TABLE tags: 0.025 sec elapsed
DONE:  dbExecute: 
      CREATE TABLE tags (
        id SERIAL,
        "name" VARCHAR NOT NULL,
        characteristics VARCHAR,
        display VARCHAR,
        color VARCHAR,
        PRIMARY KEY (id)
      );: 0.039 sec elapsed
DONE:  dbWriteTable: tags (157 rows): 0.037 sec elapsed
DONE:  dbExecute: CREATE UNIQUE INDEX tag_name_index ON tags ("name");: 0.013 sec elapsed
Built tags table. ( 157 rows )
Importing feather files for tags_to_tags.
Imported feather files for tags_to_tags.
Building tags_to_tags table.
DONE:  dbExecute: 
      CREATE TABLE tags_to_tags (
        tag_id INTEGER NOT NULL,
        related_tag_id INTEGER NOT NULL,
        PRIMARY KEY (tag_id, related_tag_id)
      );: 0.013 sec elapsed
error: Error: COPY returned error: ERROR:  null value in column "tag_id" violates not-null constraint
DETAIL:  Failing row contains (null, null).
CONTEXT:  COPY tags_to_tags, line 1: "\N    \N"
in: dbWriteTable: tags_to_tags (237 rows)
build_tags_tables failed, but don't fret, you can resume from here:
OPTION: To resume from the last failure automatically: resume()
NOTEs:
  * If you change code, you can run source('./.RProfile') and then use one of the resume-options above.
  * The error's stack trace is available at: pipeline_stack_trace
 Show Traceback

 Error: COPY returned error: ERROR:  null value in column "tag_id" violates not-null constraint
DETAIL:  Failing row contains (null, null).
CONTEXT:  COPY tags_to_tags, line 1: "\N    \N" 
DONE:  dbWriteTable: tags_to_tags (237 rows): 0.044 sec elapsed
> resume()

SKIPPING: 'create_db_en_env' (as requested by resume_at options)
SKIPPING: 'build_features_tables' (as requested by resume_at options)
SKIPPING: 'build_tags_tables' (as requested by resume_at options)
SKIPPING: 'build_genes_tables' (as requested by resume_at options)
SKIPPING: 'build_gene_types_table' (as requested by resume_at options)
SKIPPING: 'build_genes_to_types_table' (as requested by resume_at options)
SKIPPING: 'build_mutation_codes_table' (as requested by resume_at options)
SKIPPING: 'build_mutation_types_table' (as requested by resume_at options)
SKIPPING: 'build_mutations_table' (as requested by resume_at options)
SKIPPING: 'build_patients_table' (as requested by resume_at options)
SKIPPING: 'build_slides_table' (as requested by resume_at options)
SKIPPING: 'build_samples_table' (as requested by resume_at options)
SKIPPING: 'build_samples_to_mutations_table' (as requested by resume_at options)
SKIPPING: 'build_samples_to_tags_table' (as requested by resume_at options)
--------------------------------------------------------------------------------
START: build_features_to_samples_table (pipeline step 15/19)
Importing feather files for features_to_samples.
READ: feather_files/relationships/features_to_samples/features_to_samples_01.feather (12.6 megabytes): 0.044 sec elapsed
READ: feather_files/relationships/features_to_samples/features_to_samples_02.feather (12.6 megabytes): 0.041 sec elapsed
READ: feather_files/relationships/features_to_samples/features_to_samples_03.feather (12.6 megabytes): 0.04 sec elapsed
READ: feather_files/relationships/features_to_samples/pcawg_features_to_samples.feather (1.2 megabytes): 0.007 sec elapsed
Imported feather files for features_to_samples.
Ensuring features_to_samples have all the correct columns and no dupes.
Resolving partial-duplicates (917314 records)...
  finding partial-duplicates
: 1.301 sec elapsed
  found 0 duplicate records
  917314 resulting records
Resolved partial-duplicates (917314 records): 1.321 sec elapsed
Ensured features_to_samples have all the correct columns and no dupes.
Building features_to_samples data.
build_features_to_samples_table failed, but don't fret, you can resume from here:
OPTION: To resume from the last failure automatically: resume()
NOTEs:
  * If you change code, you can run source('./.RProfile') and then use one of the resume-options above.
  * The error's stack trace is available at: pipeline_stack_trace
Error: Failed to prepare query: ERROR:  relation "features" does not exist
LINE 1: SELECT * FROM  "features"
                       ^

In the third (and final!) calls to iatlas.data::build_all() and resume():

> iatlas.data::build_all()
*** Building 'iatlas_dev' database ***
Database config loaded: dev 

--------------------------------------------------------------------------------
START: create_db_en_env (pipeline step 1/19)
Env == dev
Reset == true
Current dir - /Users/heimann/Documents/iatlas-data
11.5: Pulling from library/postgres
Digest: sha256:b3770d9c4ef11eba1ff5893e28049e98e2b70083e519e0b2bce0a20e7aa832fe
Status: Image is up to date for postgres:11.5
docker.io/library/postgres:11.5
Postgres: starting - please be patient
Postgres: up - building database and tables
Postgres: creating tables and indexes...
 pg_terminate_backend 
----------------------
 t
(1 row)

Postgres: created tables and indexes
SUCCESS: create_db_en_env

--------------------------------------------------------------------------------
START: build_features_tables (pipeline step 2/19)
Importing feather files for features.
Imported feather files for features.
Ensuring features have all the correct columns and no dupes.
Resolving partial-duplicates (172 records)...
  finding partial-duplicates
: 0.23 sec elapsed
  found 102 duplicate records
  flattening partial-duplicates
: 0.004 sec elapsed
  51 de-duplicated records
  removing old partial-duplicates: 0.001 sec elapsed
  70 original records where not duplicated
  121 resulting records
Resolved partial-duplicates (172 records): 0.236 sec elapsed
Ensured features have all the correct columns and no dupes.
Building classes data.
Built classes data.
Building classes table.
build_features_tables failed, but don't fret, you can resume from here:
OPTION: To resume from the last failure automatically: resume()
NOTEs:
  * If you change code, you can run source('./.RProfile') and then use one of the resume-options above.
  * The error's stack trace is available at: pipeline_stack_trace

 Error: Failed to prepare query: FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request. 
> resume()
SKIPPING: 'create_db_en_env' (as requested by resume_at options)

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
START: build_nodes_tables (pipeline step 19/19)
Importing feather files for nodes.
Imported feather files for nodes.
Ensuring nodes have all the correct columns and no dupes.
Resolving partial-duplicates (6860 records)...
  finding partial-duplicates
: 0.207 sec elapsed
  found 0 duplicate records
  6860 resulting records
Resolved partial-duplicates (6860 records): 0.209 sec elapsed
Ensured nodes have all the correct columns and no dupes.
Building the nodes data.
Built the nodes data.
Building the nodes table.
DONE:  dbExecute: 
      CREATE TABLE nodes (
        id SERIAL,
        feature_id INTEGER,
        gene_id INTEGER,
        label VARCHAR,
        score NUMERIC,
        x NUMERIC,
        y NUMERIC,
        PRIMARY KEY (id)
      );: 0.025 sec elapsed
DONE:  dbWriteTable: nodes (6860 rows): 0.109 sec elapsed
DONE:  dbExecute: CREATE INDEX node_feature_id_index ON nodes (feature_id);: 0.031 sec elapsed
DONE:  dbExecute: CREATE INDEX node_gene_id_index ON nodes (gene_id);: 0.025 sec elapsed
DONE:  dbExecute: ALTER TABLE nodes ADD FOREIGN KEY (feature_id) REFERENCES features;: 0.018 sec elapsed
DONE:  dbExecute: ALTER TABLE nodes ADD FOREIGN KEY (gene_id) REFERENCES genes;: 0.033 sec elapsed
Built the nodes table. ( 6860 rows )
Building the nodes_to_tags data.
Built the nodes_to_tags data.
Building the nodes_to_tags table.
    (There are 10192 rows to write, this may take a little while.)
DONE:  dbExecute: 
      CREATE TABLE nodes_to_tags (
        node_id INTEGER,
        tag_id INTEGER,
        PRIMARY KEY (node_id, tag_id)
      );: 0.014 sec elapsed
DONE:  dbWriteTable: nodes_to_tags (10192 rows): 0.101 sec elapsed
DONE:  dbExecute: CREATE INDEX nodes_to_tag_tag_id_index ON nodes_to_tags (tag_id);: 0.036 sec elapsed
DONE:  dbExecute: ALTER TABLE nodes_to_tags ADD FOREIGN KEY (node_id) REFERENCES nodes;: 0.017 sec elapsed
DONE:  dbExecute: ALTER TABLE nodes_to_tags ADD FOREIGN KEY (tag_id) REFERENCES tags;: 0.018 sec elapsed
Built the nodes_to_tags table. ( 10192 rows )
Importing feather files for edges.
READ: feather_files/edges/cellimage_edges.feather (48.4 megabytes): 0.476 sec elapsed
READ: feather_files/edges/tcga_cytokine_edges.feather (44.3 megabytes): 0.464 sec elapsed
Imported feather files for edges.
Ensuring edges have all the correct columns and no dupes.
Resolving partial-duplicates (972300 records)...
  finding partial-duplicates
: 1.646 sec elapsed
  found 13780 duplicate records
  flattening partial-duplicates
: 0.156 sec elapsed
  6890 de-duplicated records
  removing old partial-duplicates: 1.511 sec elapsed
  958520 original records where not duplicated
  965410 resulting records
Resolved partial-duplicates (972300 records): 3.394 sec elapsed
Ensured edges have all the correct columns and no dupes.
Building the edges data.
Built the edges data.
Building the edges table.
    (There are 10600 rows to write, this may take a little while.)
DONE:  dbExecute: 
      CREATE TABLE edges (
        id SERIAL,
        label VARCHAR,
        node_1_id INTEGER NOT NULL,
        node_2_id INTEGER NOT NULL,
        score NUMERIC,
        PRIMARY KEY (id)
      );: 0.027 sec elapsed
DONE:  dbWriteTable: edges (10600 rows): 0.993 sec elapsed
DONE:  dbExecute: CREATE INDEX edge_node_2_id_index ON edges (node_2_id);: 0.039 sec elapsed
DONE:  dbExecute: CREATE INDEX edge_nodes_id_index ON edges (node_1_id, node_2_id);: 0.032 sec elapsed
DONE:  dbExecute: ALTER TABLE edges ADD FOREIGN KEY (node_1_id) REFERENCES nodes;: 0.02 sec elapsed
DONE:  dbExecute: ALTER TABLE edges ADD FOREIGN KEY (node_2_id) REFERENCES nodes;: 0.017 sec elapsed
Built the edges table. ( 10600 rows )
SUCCESS: build_nodes_tables

================================================================================
SUCCESS!
Time taken to run pipeline: 111.866 sec elapsed
andrewelamb commented 4 years ago

@heimannch going to close this as once we have the API going we'll be able to hit that while developing locally.