♻️Ensure parent project/node is well structured in the DB 🗃️

What do these changes do?

This is the first part toward structuring node parenting.

Context

In Sim4life.io currently the API client PATCHES the created computational jobs with some free form JSON metadata. Among these is a node_id field which contains the parent Node (e.g. the node where s4l is running). The DV-2 uses that information to define the parent project/node by looking for the presence of that field. This is currently not defined anywhere and could break very easily.

Goal

Bring structure to this parenting, and allow other products to use this as easy as possible.

Idea

The oSparc API client shall automagically find out when it is running from an oSparc node by checking for some pre-defined ENV variables. For other free form API client, the usage relies on the respective authors responsibility (e.g. sim4life).

Constraint

Keep backward compatibility

Changes

Project creation POST /projects now accept a couple of headers:
- X-Simcore-Parent-Project-Uuid which optionally contains the parent project UUID
- X-Simcore-Parent-Node-Id which optionally contains the parent node ID
- if both are given then the parenting structure will be setup in the projects_metadata table and used as before
- if none are given or are both null then nothing happens
- if they are invalid an error will be returned (404 - NotFound)
- if only one of them is filled then an error will be returned (422 - Unprocessable)
to keep backward compatibility, that means the the project was created without parents,
and PATCH /projects/{uuid}/metadata is called with custom_metadata containing a _nodeid field:
- This will set the parenting in case the passed _nodeid is found,
- and that there is not already parenting set beforehand

The next steps will be re-arranging the DV-2 to directly get the information regarding parent project/node from the DB instead of extracting the information from the metadata, and taking care of the oSparc client (@bisgaard-itis )

Related issue/s

How to test

Dev-ops checklist

[x] No ENV changes or I properly updated ENV (read the instruction)

Codecov Report

Attention: Patch coverage is 94.42060% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 87.7%. Comparing base (cafbf96) to head (048f90b). Report is 242 commits behind head on master.

Additional details and impacted files

[![Impacted file tree graph](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874/graphs/tree.svg?width=650&height=150&src=pr&token=h1rOE8q7ic&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation)](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) ```diff @@ Coverage Diff @@ ## master #5874 +/- ## ========================================= + Coverage 84.5% 87.7% +3.2% ========================================= Files 10 1398 +1388 Lines 214 57580 +57366 Branches 25 1330 +1305 ========================================= + Hits 181 50554 +50373 - Misses 23 6743 +6720 - Partials 10 283 +273 ``` | [Flag](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) | Coverage Δ | | |---|---|---| | [integrationtests](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) | `64.7% <48.1%> (?)` | | | [unittests](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) | `85.8% <94.4%> (+1.2%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) | Coverage Δ | | |---|---|---| | [...core\_postgres\_database/models/projects\_metadata.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=packages%2Fpostgres-database%2Fsrc%2Fsimcore_postgres_database%2Fmodels%2Fprojects_metadata.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-cGFja2FnZXMvcG9zdGdyZXMtZGF0YWJhc2Uvc3JjL3NpbWNvcmVfcG9zdGdyZXNfZGF0YWJhc2UvbW9kZWxzL3Byb2plY3RzX21ldGFkYXRhLnB5) | `100.0% <100.0%> (ø)` | | | [...mcore\_postgres\_database/utils\_projects\_metadata.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=packages%2Fpostgres-database%2Fsrc%2Fsimcore_postgres_database%2Futils_projects_metadata.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-cGFja2FnZXMvcG9zdGdyZXMtZGF0YWJhc2Uvc3JjL3NpbWNvcmVfcG9zdGdyZXNfZGF0YWJhc2UvdXRpbHNfcHJvamVjdHNfbWV0YWRhdGEucHk=) | `100.0% <100.0%> (ø)` | | | [.../simcore\_postgres\_database/utils\_projects\_nodes.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=packages%2Fpostgres-database%2Fsrc%2Fsimcore_postgres_database%2Futils_projects_nodes.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-cGFja2FnZXMvcG9zdGdyZXMtZGF0YWJhc2Uvc3JjL3NpbWNvcmVfcG9zdGdyZXNfZGF0YWJhc2UvdXRpbHNfcHJvamVjdHNfbm9kZXMucHk=) | `90.0% <100.0%> (ø)` | | | [...rary/src/servicelib/aiohttp/requests\_validation.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=packages%2Fservice-library%2Fsrc%2Fservicelib%2Faiohttp%2Frequests_validation.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-cGFja2FnZXMvc2VydmljZS1saWJyYXJ5L3NyYy9zZXJ2aWNlbGliL2Fpb2h0dHAvcmVxdWVzdHNfdmFsaWRhdGlvbi5weQ==) | `87.5% <100.0%> (ø)` | | | [...s/service-library/src/servicelib/common\_headers.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=packages%2Fservice-library%2Fsrc%2Fservicelib%2Fcommon_headers.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-cGFja2FnZXMvc2VydmljZS1saWJyYXJ5L3NyYy9zZXJ2aWNlbGliL2NvbW1vbl9oZWFkZXJzLnB5) | `100.0% <100.0%> (ø)` | | | [...ore\_service\_webserver/projects/\_crud\_api\_create.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=services%2Fweb%2Fserver%2Fsrc%2Fsimcore_service_webserver%2Fprojects%2F_crud_api_create.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-c2VydmljZXMvd2ViL3NlcnZlci9zcmMvc2ltY29yZV9zZXJ2aWNlX3dlYnNlcnZlci9wcm9qZWN0cy9fY3J1ZF9hcGlfY3JlYXRlLnB5) | `98.3% <100.0%> (ø)` | | | [...mcore\_service\_webserver/projects/\_crud\_handlers.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=services%2Fweb%2Fserver%2Fsrc%2Fsimcore_service_webserver%2Fprojects%2F_crud_handlers.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-c2VydmljZXMvd2ViL3NlcnZlci9zcmMvc2ltY29yZV9zZXJ2aWNlX3dlYnNlcnZlci9wcm9qZWN0cy9fY3J1ZF9oYW5kbGVycy5weQ==) | `91.3% <100.0%> (ø)` | | | [...ervice\_webserver/projects/\_crud\_handlers\_models.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=services%2Fweb%2Fserver%2Fsrc%2Fsimcore_service_webserver%2Fprojects%2F_crud_handlers_models.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-c2VydmljZXMvd2ViL3NlcnZlci9zcmMvc2ltY29yZV9zZXJ2aWNlX3dlYnNlcnZlci9wcm9qZWN0cy9fY3J1ZF9oYW5kbGVyc19tb2RlbHMucHk=) | `98.0% <100.0%> (ø)` | | | [...imcore\_service\_webserver/projects/\_metadata\_api.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=services%2Fweb%2Fserver%2Fsrc%2Fsimcore_service_webserver%2Fprojects%2F_metadata_api.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-c2VydmljZXMvd2ViL3NlcnZlci9zcmMvc2ltY29yZV9zZXJ2aWNlX3dlYnNlcnZlci9wcm9qZWN0cy9fbWV0YWRhdGFfYXBpLnB5) | `100.0% <100.0%> (ø)` | | | [...c/simcore\_service\_webserver/projects/exceptions.py](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree&filepath=services%2Fweb%2Fserver%2Fsrc%2Fsimcore_service_webserver%2Fprojects%2Fexceptions.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation#diff-c2VydmljZXMvd2ViL3NlcnZlci9zcmMvc2ltY29yZV9zZXJ2aWNlX3dlYnNlcnZlci9wcm9qZWN0cy9leGNlcHRpb25zLnB5) | `93.4% <100.0%> (ø)` | | | ... and [5 more](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation) | | ... and [1353 files with indirect coverage changes](https://app.codecov.io/gh/ITISFoundation/osparc-simcore/pull/5874/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ITISFoundation)

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

Not sure whether we are on the same page. I am talking about this view: which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing.

I would propose for example having it as additional column in the `projects_metadata` table: Project ID	Parent Project ID	Root Parent Project ID
A	NULL	NULL
B	A	A
C	B	A

We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

Not sure whether we are on the same page. I am talking about this view: which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing.

I would propose for example having it as additional column in the projects_metadata table:

Project ID Parent Project ID Root Parent Project ID A NULL NULL B A A C B A We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.

for completeness of the discussion, I agreed to add the root project ID/root project Node ID

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

Not sure whether we are on the same page. I am talking about this view: which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing. I would propose for example having it as additional column in the projects_metadata table: Project ID Parent Project ID Root Parent Project ID A NULL NULL B A A C B A We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.

for completeness of the discussion, I agreed to add the root project ID/root project Node ID

Sounds good to me. Thanks

ITISFoundation / osparc-simcore