Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
21 stars 12 forks source link

fix(iatlas): unable to run iatlas-data:serve-detach #2561

Closed tschaffter closed 3 months ago

tschaffter commented 3 months ago

Closes #2560

Changelog

Future Work

Get access to a sample of iAtlas data - or mock data - that can be loaded quickly in the DB for a better DX

Preview

The container for iatlas-data is now limited to creating the tables

$ docker logs -f iatlas-data
[13/Mar/2024 16:23:39] INFO [root._drop_all_tables:32] Dropping all tables
[13/Mar/2024 16:23:40] INFO [root._drop_all_tables:34] Dropped all tables
[13/Mar/2024 16:23:40] INFO [root._get_database_schema:42] Getting database schema
[13/Mar/2024 16:24:05] INFO [root._get_database_schema:44] Got database schema
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:53] Building database
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: patients
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: mutation_types
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: genes
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: features
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: datasets
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: tags
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: publications
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: samples
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: mutations
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: gene_sets
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: nodes
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: snps
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: cells
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: tags_to_tags
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: tags_to_publications
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: slides
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: single_cell_pseudobulk_features
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: single_cell_pseudobulk
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: samples_to_tags
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: samples_to_mutations
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: rare_variant_pathway_associations
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: publications_to_genes_to_gene_sets
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: neoantigens
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: heritability_results
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: genes_to_samples
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: genes_to_gene_sets
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: features_to_samples
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: edges
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: germline_gwas_results
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: driver_results
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: datasets_to_tags
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: datasets_to_samples
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: copy_number_results
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: colocalizations
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts_to_tags
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts_to_samples
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts_to_mutations
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts_to_genes
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts_to_features
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cells_to_samples
[13/Mar/2024 16:24:09] INFO [root._build_database_from_schema:55] Adding table to database schema: cells_to_genes
[13/Mar/2024 16:24:09] INFO [root._build_database_from_schema:55] Adding table to database schema: cells_to_features
[13/Mar/2024 16:24:09] INFO [root._build_database_from_schema:55] Adding table to database schema: cell_stats
[13/Mar/2024 16:24:09] INFO [root._build_database_from_schema:57] Database built
tschaffter commented 3 months ago

Error after updating script and dependencies

When using the main schema URL:

$ docker logs iatlas-data
Traceback (most recent call last):
  File "/src/build_database.py", line 1863, in <module>
    schema = Schema(
  File "/usr/local/lib/python3.10/site-packages/schematic_db/schema/schema.py", line 146, in __init__
    self.schema_graph = SchemaGraph(config.schema_url, display_label_type)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/schema_graph/schema_graph.py", line 23, in __init__
    self.schema_graph = self.create_schema_graph()
  File "/usr/local/lib/python3.10/site-packages/schematic_db/schema_graph/schema_graph.py", line 31, in create_schema_graph
    subgraph = get_graph_by_edge_type(
  File "/usr/local/lib/python3.10/site-packages/schematic_db/api_utils/api_utils.py", line 202, in get_graph_by_edge_type
    response = create_schematic_api_response("schemas/get/graph_by_edge_type", params)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/api_utils/api_utils.py", line 122, in create_schematic_api_response
    raise SchematicAPIError(
schematic_db.api_utils.api_utils.SchematicAPIError: Error accessing Schematic endpoint; URL: https://schematic-staging.api.sagebionetworks.org/v1/schemas/get/graph_by_edge_type; Code: 500; Reason: INTERNAL SERVER ERROR; Time (PST): 2024-03-11 15:02:27.054946-07:00; Parameters: {'schema_url': 'https://raw.githubusercontent.com/CRI-iAtlas/iAtlasSchema/main/iatlas_schema.jsonld', 'relationship': 'requiresComponent', 'data_model_labels': 'display_label'}

When using the develop schema URL as suggested by @andrewelamb, the container is processing for a few minutes before throwing an error:

[11/Mar/2024 21:28:54] INFO [root._download_manifest:189] Downloading manifest; table name: nodes; manifest id: synXXX
[WARNING] /usr/local/lib/python3.10/site-packages/schematic_db/synapse/synapse.py:71: DtypeWarning: Columns (3,6,10) have mixed types. Specify dtype option on import or set low_memory=False.
  return pandas.read_csv(entity.path, keep_default_na=False, na_values="")

[11/Mar/2024 21:28:58] WARNING [py.warnings._showwarnmsg:109] /usr/local/lib/python3.10/site-packages/schematic_db/synapse/synapse.py:71: DtypeWarning: Columns (3,6,10) have mixed types. Specify dtype option on import or set low_memory=False.
  return pandas.read_csv(entity.path, keep_default_na=False, na_values="")

[11/Mar/2024 21:28:58] INFO [root._download_manifest:195] Finished downloading manifest
[11/Mar/2024 21:28:58] INFO [root._update_table_with_manifest:238] Updating manifest; table name: nodes; manifest id: synXXX
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1960, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb/sql_alchemy_database.py", line 229, in insert_table_rows
    conn.execute(statement)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1408, in execute
    return meth(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 513, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1630, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1839, in _execute_context
    return self._exec_single_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1979, in _exec_single_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2335, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1960, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

[SQL: INSERT INTO nodes (dataset_id, id, label, name, network, node_feature_id, node_gene_id, score, tag_1_id, tag_2_id, x, y) TOO MUCH TO INCLUDE
(Background on this error at: https://sqlalche.me/e/20/e3q8)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb_updater/rdb_updater.py", line 245, in _update_table_with_manifest
    self.rdb.insert_table_rows(table_name, table)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb/sql_alchemy_database.py", line 231, in insert_table_rows
    raise InsertDatabaseError(table_name) from exception
schematic_db.rdb.rdb.InsertDatabaseError: Error inserting table; Table Name: nodes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/src/build_database.py", line 1884, in <module>
    updater.update_database(method="insert")
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb_updater/rdb_updater.py", line 119, in update_database
    self.update_table(name, method)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb_updater/rdb_updater.py", line 144, in update_table
    self._update_table_with_manifest_id(table_name, manifest_id, method)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb_updater/rdb_updater.py", line 175, in _update_table_with_manifest_id
    self._update_table_with_manifest(
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb_updater/rdb_updater.py", line 251, in _update_table_with_manifest
    raise UpdateError(table_name, manifest_id) from exc
schematic_db.rdb_updater.rdb_updater.UpdateError: Error updating table; Table Name: nodes; Dataset ID: synXXX