Closed sanderegg closed 4 months ago
Attention: Patch coverage is 94.42060%
with 13 lines
in your changes are missing coverage. Please review.
Project coverage is 87.7%. Comparing base (
cafbf96
) to head (048f90b
). Report is 242 commits behind head on master.
🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?
🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?
Couldn't this be resolved by following the parent_id
s and adding up the costs along the way? Or is this operation too costly?
🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?
Couldn't this be resolved by following the
parent_id
s and adding up the costs along the way? Or is this operation too costly?
Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID
somewhere.
🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?
Couldn't this be resolved by following the
parent_id
s and adding up the costs along the way? Or is this operation too costly?Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the
root project ID
somewhere.
What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id
). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.
🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?
Couldn't this be resolved by following the
parent_id
s and adding up the costs along the way? Or is this operation too costly?Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the
root project ID
somewhere.What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given
project_id
). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.
Not sure whether we are on the same page. I am talking about this view: which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing.
I would propose for example having it as additional column in the projects_metadata table:
Project ID |
Parent Project ID | Root Parent Project ID |
---|---|---|
A | NULL | NULL |
B | A | A |
C | B | A |
We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.
🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?
Couldn't this be resolved by following the
parent_id
s and adding up the costs along the way? Or is this operation too costly?Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the
root project ID
somewhere.What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given
project_id
). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.Not sure whether we are on the same page. I am talking about this view: which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing.
I would propose for example having it as additional column in the
projects_metadata
table:Project ID Parent Project ID Root Parent Project ID A NULL NULL B A A C B A We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.
for completeness of the discussion, I agreed to add the root project ID/root project Node ID
🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?
Couldn't this be resolved by following the
parent_id
s and adding up the costs along the way? Or is this operation too costly?Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the
root project ID
somewhere.What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given
project_id
). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.Not sure whether we are on the same page. I am talking about this view: which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing. I would propose for example having it as additional column in the
projects_metadata
table: Project ID Parent Project ID Root Parent Project ID A NULL NULL B A A C B A We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.for completeness of the discussion, I agreed to add the root project ID/root project Node ID
Sounds good to me. Thanks
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code
What do these changes do?
This is the first part toward structuring node parenting.
Context
In Sim4life.io currently the API client PATCHES the created computational jobs with some free form JSON metadata. Among these is a
node_id
field which contains the parent Node (e.g. the node where s4l is running). The DV-2 uses that information to define the parent project/node by looking for the presence of that field. This is currently not defined anywhere and could break very easily.Goal
Bring structure to this parenting, and allow other products to use this as easy as possible.
Idea
The oSparc API client shall automagically find out when it is running from an oSparc node by checking for some pre-defined ENV variables. For other free form API client, the usage relies on the respective authors responsibility (e.g. sim4life).
Constraint
Keep backward compatibility
Changes
POST /projects
now accept a couple of headers:X-Simcore-Parent-Project-Uuid
which optionally contains the parent project UUIDX-Simcore-Parent-Node-Id
which optionally contains the parent node IDprojects_metadata
table and used as beforenull
then nothing happensPATCH /projects/{uuid}/metadata
is called with custom_metadata containing a _nodeid field:The next steps will be re-arranging the DV-2 to directly get the information regarding parent project/node from the DB instead of extracting the information from the metadata, and taking care of the oSparc client (@bisgaard-itis )
Related issue/s
How to test
Dev-ops checklist