Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
21 stars 12 forks source link

[Task] Load the transformed EDAM data into the OC Challenge Service DB #2548

Open tschaffter opened 4 months ago

tschaffter commented 4 months ago

What product(s) is this story for?

OpenChallenges

As a user, I want

No response

Description

Depends on #2547

Load the transformed EDAM data generated in #2547 into the OC Challenge Service DB.

Acceptance criteria

Running the following commands download, transform, and loads the EDAM data into the OC Challenge Service DB:

Tasks

Anything else?

See #2524 and its PR to get familiar with the environment of the project openchallenges-edam-etl.

Have you linked this story to a GitHub Project?

tschaffter commented 3 months ago

Added to Sprint 24.4

tschaffter commented 1 month ago

Moved to Backlog

mdsage1 commented 1 month ago

Update: 05/13/2024 Challenges: N/A Remaining Tasks: Implement the load aspect of the ETL process so that the generated dataset is available/accessible in the MariaDB

mdsage1 commented 1 month ago

Update: 05/15/2024 Challenges:

@tschaffter I have written code to connect to MariaDB using python. The OC_DB_URL within the .env file, jdbc:mysql://openchallenges-mariadb:3306/edam_etl, isn't used according to the documentation I have located. Resource1 and Resource2 for connecting to MariaDB.

I've received this error when using jdbc:mysql://openchallenges-mariadb:3306/edam_etl as the Host: Error connecting to MariaDB Platform: Plugin jdbc:mysql could not be loaded: /usr/lib/x86_64-linux-gnu/libmariadb3/plugin/jdbc:mysql.so: cannot open shared object file: No such file or directory Warning: command "poetry run python src/main.py" exited with non-zero status code

I get this error when I change the host to openchallenges-mariadb and don't use the OC_DB_URL in the .env file:

Error connecting to MariaDB Platform: Can't connect to local server through socket '/run/mysqld/mysqld.sock' (2)
Warning: command "poetry run python src/main.py" exited with non-zero status code

I'm wondering if the variables are being assigned incorrectly.

tschaffter commented 1 month ago

I believe that we have solve the issue since your last message. Feel free to get rid of the config variable OC_DB_URL. We use it the OC microservice because the DB client we use in Java accept this URL as a parameter, which not be the case of the DB client for Python you use.

mdsage1 commented 1 month ago

@tschaffter For PR #2680 It looks like the CI/pr (pull_request) check is failing because it can not find the installation of the MariaDB Connector/C required by Maria DB which doesn't support PEP builds. I was able to bypass this in the Dev container by running the indicated command in the terminal but I guess that doesn't transfer to the PR. It says it needs to be preinstalled but I'm unsure how that works w/in microservices. Should I be creating a script that will perform this operation w/in the app folder?

Using virtualenv: /workspaces/sage-monorepo/apps/openchallenges/edam-etl/.venv Installing dependencies from lock file

Package operations: 1 install, 0 updates, 0 removals

  • Installing mariadb (1.1.10)

    ChefBuildError

    Backend subprocess exited when trying to invoke get_requires_for_build_wheel

    /bin/sh: 1: mariadb_config: not found Traceback (most recent call last): File "/etc/poetry/venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/etc/poetry/venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/etc/poetry/venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel return hook(config_settings) ^^^^^^^^^^^^^^^^^^^^^ File "/tmp/tmppwky36mf/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=['wheel']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/tmppwky36mf/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires self.run_setup() File "/tmp/tmppwky36mf/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 487, in run_setup super().run_setup(setup_script=setup_script) File "/tmp/tmppwky36mf/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 311, in run_setup exec(code, locals()) File "", line 27, in File "/tmp/tmp783xeam4/mariadb-1.1.10/mariadb_posix.py", line 62, in get_config cc_version = mariadb_config(config_prg, "cc_version") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/tmp783xeam4/mariadb-1.1.10/mariadb_posix.py", line 28, in mariadb_config raise EnvironmentError( OSError: mariadb_config not found.

    This error typically indicates that MariaDB Connector/C, a dependency which must be preinstalled, is not found. If MariaDB Connector/C is not installed, see installation instructions If MariaDB Connector/C is installed, either set the environment variable MARIADB_CONFIG or edit the configuration file 'site.cfg' to set the 'mariadb_config' option to the file location of the mariadb_config utility.

    at /etc/poetry/venv/lib/python3.10/site-packages/poetry/installation/chef.py:164 in _prepare 160│ 161│ error = ChefBuildError("\n\n".join(message_parts)) 162│ 163│ if error is not None: → 164│ raise error from None 165│ 166│ return path 167│ 168│ def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with mariadb (1.1.10) not supporting PEP 517 builds. You can verify this by running 'pip wheel --no-cache-dir --use-pep517 "mariadb (==1.1.10)"'.

tschaffter commented 1 month ago

@mdsage1 Hint: Look at the files in the EDAM ETL project folder, in particular to project.json. There is a perfect place somewhere where the pip command could be added.