Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
21 stars 12 forks source link

[Bug] Unable to run `iatlas-data:serve-detach` #2560

Closed tschaffter closed 3 months ago

tschaffter commented 3 months ago

Is there an existing issue for this?

What product(s) are you seeing the problem on?

iAtlas

Current behavior

The project iatlas-data was added in the #2411. At the time, I was able to successfully run the containerized application with the command shown below and in the Preview section of the PR.

nx serve-detach iatlas-data

I'm trying to run the same command today and it fails with the following error:

$ docker logs iatlas-data
Traceback (most recent call last):
  File "/src/build_database.py", line 1598, in <module>
    schema = Schema(
  File "/usr/local/lib/python3.10/site-packages/schematic_db/schema/schema.py", line 146, in __init__
    self.schema_graph = SchemaGraph(config.schema_url)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/schema_graph/schema_graph.py", line 18, in __init__
    self.schema_graph = self.create_schema_graph()
  File "/usr/local/lib/python3.10/site-packages/schematic_db/schema_graph/schema_graph.py", line 26, in create_schema_graph
    subgraph = get_graph_by_edge_type(self.schema_url, "requiresComponent")
  File "/usr/local/lib/python3.10/site-packages/schematic_db/api_utils/api_utils.py", line 185, in get_graph_by_edge_type
    response = create_schematic_api_response("schemas/get/graph_by_edge_type", params)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/api_utils/api_utils.py", line 121, in create_schematic_api_response
    raise SchematicAPIError(
schematic_db.api_utils.api_utils.SchematicAPIError: Error accessing Schematic endpoint; URL: https://schematic-staging.api.sagebionetworks.org/v1/schemas/get/graph_by_edge_type; Code: 500; Reason: INTERNAL SERVER ERROR; Time (PST): 2024-03-11 10:40:49.355164-07:00; Parameters: {'schema_url': 'https://raw.githubusercontent.com/CRI-iAtlas/iAtlasSchema/main/iatlas_schema.jsonld', 'relationship': 'requiresComponent'}

Expected behavior

The command should complete successfully.

Anything else?

No response

Commit ID

No response

Are you developing inside the dev container?

Code of Conduct

tschaffter commented 3 months ago

It was a memory issue with iatlas-postgres

The issue was because of the memory limit set to the DB (500MB), not the iatlas-data container.

Here are the container memory limits used successfully when I migrated the iatlas-data project:

The iAtlas data have been updated lately. The whole DB is expected to be 40-50 GB according to @andrewelamb .

Populating the DB now takes more than 1 hour, which is much more than last time I did it (20-30 min?). I had to interrupt the process because the machine became unresponsive.

Here are a glimpse into the memory usage, which is much larger than before. This is only a snapshot I observe and not a guarantee of the max memory usage.

CONTAINER ID   NAME              CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
ecee7c11e472   iatlas-data       0.00%     15.18GiB / 31.06GiB   48.89%    3.19GB / 3.15GB   55.4MB / 0B       9
83deafe2a5b1   iatlas-postgres   11.65%    7.787GiB / 31.06GiB   25.08%    3.13GB / 13.3MB   9.45GB / 55.8GB   8
tschaffter commented 3 months ago

The loading of the data took more than one hour and up to 22 GB of memory at some point (see previous comment).

@andrewelamb Would it be possible to create a sample of data, real or mock data, that developers could use when testing the iAtlas stack locally?

andrewelamb commented 3 months ago

Definitely possible! It woudl just take some effort to create the mock data, or change schematic db to be able to take in a subset of the manifests.