ITISFoundation / osparc-simcore

🐼 osparc-simcore simulation framework
https://osparc.io
MIT License
43 stars 27 forks source link

♻️Ensure parent project/node is well structured in the DB 🗃️ #5874

Closed sanderegg closed 4 months ago

sanderegg commented 4 months ago

What do these changes do?

This is the first part toward structuring node parenting.

Context

In Sim4life.io currently the API client PATCHES the created computational jobs with some free form JSON metadata. Among these is a node_id field which contains the parent Node (e.g. the node where s4l is running). The DV-2 uses that information to define the parent project/node by looking for the presence of that field. This is currently not defined anywhere and could break very easily.

Goal

Bring structure to this parenting, and allow other products to use this as easy as possible.

Idea

The oSparc API client shall automagically find out when it is running from an oSparc node by checking for some pre-defined ENV variables. For other free form API client, the usage relies on the respective authors responsibility (e.g. sim4life).

Constraint

Keep backward compatibility

Changes

The next steps will be re-arranging the DV-2 to directly get the information regarding parent project/node from the DB instead of extracting the information from the metadata, and taking care of the oSparc client (@bisgaard-itis )

Related issue/s

How to test

Dev-ops checklist

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 94.42060% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 87.7%. Comparing base (cafbf96) to head (048f90b). Report is 242 commits behind head on master.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874/graphs/tree.svg?width=650&height=150&src=pr&token=h1rOE8q7ic&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation)](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) ```diff @@ Coverage Diff @@ ## master #5874 +/- ## ========================================= + Coverage 84.5% 87.7% +3.2% ========================================= Files 10 1398 +1388 Lines 214 57580 +57366 Branches 25 1330 +1305 ========================================= + Hits 181 50554 +50373 - Misses 23 6743 +6720 - Partials 10 283 +273 ``` | [Flag](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) | Coverage Δ | | |---|---|---| | [integrationtests](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) | `64.7% <48.1%> (?)` | | | [unittests](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) | `85.8% <94.4%> (+1.2%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) | Coverage Δ | | |---|---|---| | [...core\_postgres\_database/models/projects\_metadata.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=packages%2Fpostgres-database%2Fsrc%2Fsimcore_postgres_database%2Fmodels%2Fprojects_metadata.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-cGFja2FnZXMvcG9zdGdyZXMtZGF0YWJhc2Uvc3JjL3NpbWNvcmVfcG9zdGdyZXNfZGF0YWJhc2UvbW9kZWxzL3Byb2plY3RzX21ldGFkYXRhLnB5) | `100.0% <100.0%> (ø)` | | | [...mcore\_postgres\_database/utils\_projects\_metadata.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=packages%2Fpostgres-database%2Fsrc%2Fsimcore_postgres_database%2Futils_projects_metadata.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-cGFja2FnZXMvcG9zdGdyZXMtZGF0YWJhc2Uvc3JjL3NpbWNvcmVfcG9zdGdyZXNfZGF0YWJhc2UvdXRpbHNfcHJvamVjdHNfbWV0YWRhdGEucHk=) | `100.0% <100.0%> (ø)` | | | [.../simcore\_postgres\_database/utils\_projects\_nodes.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=packages%2Fpostgres-database%2Fsrc%2Fsimcore_postgres_database%2Futils_projects_nodes.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-cGFja2FnZXMvcG9zdGdyZXMtZGF0YWJhc2Uvc3JjL3NpbWNvcmVfcG9zdGdyZXNfZGF0YWJhc2UvdXRpbHNfcHJvamVjdHNfbm9kZXMucHk=) | `90.0% <100.0%> (ø)` | | | [...rary/src/servicelib/aiohttp/requests\_validation.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=packages%2Fservice-library%2Fsrc%2Fservicelib%2Faiohttp%2Frequests_validation.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-cGFja2FnZXMvc2VydmljZS1saWJyYXJ5L3NyYy9zZXJ2aWNlbGliL2Fpb2h0dHAvcmVxdWVzdHNfdmFsaWRhdGlvbi5weQ==) | `87.5% <100.0%> (ø)` | | | [...s/service-library/src/servicelib/common\_headers.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=packages%2Fservice-library%2Fsrc%2Fservicelib%2Fcommon_headers.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-cGFja2FnZXMvc2VydmljZS1saWJyYXJ5L3NyYy9zZXJ2aWNlbGliL2NvbW1vbl9oZWFkZXJzLnB5) | `100.0% <100.0%> (ø)` | | | [...ore\_service\_webserver/projects/\_crud\_api\_create.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=services%2Fweb%2Fserver%2Fsrc%2Fsimcore_service_webserver%2Fprojects%2F_crud_api_create.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-c2VydmljZXMvd2ViL3NlcnZlci9zcmMvc2ltY29yZV9zZXJ2aWNlX3dlYnNlcnZlci9wcm9qZWN0cy9fY3J1ZF9hcGlfY3JlYXRlLnB5) | `98.3% <100.0%> (ø)` | | | [...mcore\_service\_webserver/projects/\_crud\_handlers.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=services%2Fweb%2Fserver%2Fsrc%2Fsimcore_service_webserver%2Fprojects%2F_crud_handlers.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-c2VydmljZXMvd2ViL3NlcnZlci9zcmMvc2ltY29yZV9zZXJ2aWNlX3dlYnNlcnZlci9wcm9qZWN0cy9fY3J1ZF9oYW5kbGVycy5weQ==) | `91.3% <100.0%> (ø)` | | | [...ervice\_webserver/projects/\_crud\_handlers\_models.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=services%2Fweb%2Fserver%2Fsrc%2Fsimcore_service_webserver%2Fprojects%2F_crud_handlers_models.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-c2VydmljZXMvd2ViL3NlcnZlci9zcmMvc2ltY29yZV9zZXJ2aWNlX3dlYnNlcnZlci9wcm9qZWN0cy9fY3J1ZF9oYW5kbGVyc19tb2RlbHMucHk=) | `98.0% <100.0%> (ø)` | | | [...imcore\_service\_webserver/projects/\_metadata\_api.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=services%2Fweb%2Fserver%2Fsrc%2Fsimcore_service_webserver%2Fprojects%2F_metadata_api.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-c2VydmljZXMvd2ViL3NlcnZlci9zcmMvc2ltY29yZV9zZXJ2aWNlX3dlYnNlcnZlci9wcm9qZWN0cy9fbWV0YWRhdGFfYXBpLnB5) | `100.0% <100.0%> (ø)` | | | [...c/simcore\_service\_webserver/projects/exceptions.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=services%2Fweb%2Fserver%2Fsrc%2Fsimcore_service_webserver%2Fprojects%2Fexceptions.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-c2VydmljZXMvd2ViL3NlcnZlci9zcmMvc2ltY29yZV9zZXJ2aWNlX3dlYnNlcnZlci9wcm9qZWN0cy9leGNlcHRpb25zLnB5) | `93.4% <100.0%> (ø)` | | | ... and [5 more](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) | | ... and [1353 files with indirect coverage changes](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation)
matusdrobuliak66 commented 4 months ago

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

bisgaard-itis commented 4 months ago

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

matusdrobuliak66 commented 4 months ago

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

bisgaard-itis commented 4 months ago

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

matusdrobuliak66 commented 4 months ago

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

Not sure whether we are on the same page. I am talking about this view: image which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing.

I would propose for example having it as additional column in the projects_metadata table: Project ID Parent Project ID Root Parent Project ID
A NULL NULL
B A A
C B A

We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.

sanderegg commented 4 months ago

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

Not sure whether we are on the same page. I am talking about this view: image which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing.

I would propose for example having it as additional column in the projects_metadata table:

Project ID Parent Project ID Root Parent Project ID A NULL NULL B A A C B A We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.

for completeness of the discussion, I agreed to add the root project ID/root project Node ID

bisgaard-itis commented 4 months ago

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

Not sure whether we are on the same page. I am talking about this view: image which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing. I would propose for example having it as additional column in the projects_metadata table: Project ID Parent Project ID Root Parent Project ID A NULL NULL B A A C B A We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.

for completeness of the discussion, I agreed to add the root project ID/root project Node ID

Sounds good to me. Thanks

sonarcloud[bot] commented 4 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud