BSC-ES / autosubmit-api

Autosubmit API is a package that consumes the information generated by Autosubmit and serves it as an API.
GNU General Public License v3.0
4 stars 0 forks source link

Workflow-specific performance metrics #75

Open LuiggiTenorioK opened 6 months ago

LuiggiTenorioK commented 6 months ago

In GitLab by @mcastril on May 6, 2024, 10:55

There are some performance metrics that cannot be easily calculated from the current parameters that Autosubmit stores in the DDBB in a homogeneous way across different workflows. Some examples are the Coupling Cost or the Complexity. In the past, we agreed to enable a mechanism by which the workflow would be responsible for generating this data, which should be provided by Autosubmit/Autosubmit-API in the way that we decide to be available as an endpoint in the API.

For instance, the workflow can provide a YAML file written in a predetermined path (we could make it fixed or have a parameter in the configuration to allow the users to change this path) that the Autosubmit API would check when the endpoint is reached.

As we are moving to a different DDBB backend and trying to make Autosubmit less dependent on shared files in the filesystem, I understand this can be an issue. We could decide if it's worth having a field in the DDBB to store these additional metrics in bulk by Autosubmit (then it would be Autosubmit, and not the API the one that consumes the workflow file).

This development has not moved forward in the last few years due to the lack of a real necessity, but it's a requirement from DestinE in Phase 2 (August).

Performance metrics: https://docs.google.com/document/d/12yWDwXsohf4G4MPeP6e3Eil4ZL-YeIN71dBcoWRliEg/edit

Previous decisions about how to implement this:

https://earth.bsc.es/gitlab/es/autosubmit/-/issues/674#note_160254 https://earth.bsc.es/gitlab/es/autosubmit/-/issues/524#note_90901

CC @kinow @dbeltrankyl

LuiggiTenorioK commented 6 months ago

In GitLab by @kinow on May 6, 2024, 11:00

@mcastril just to check on this part

This development has not moved forward in the last few years due to the lack of a real necessity, but it's a requirement from DestinE in Phase 2 (August).

August, 2024, right?

Thanks for the details in the issue description!

LuiggiTenorioK commented 6 months ago

In the past, we agreed to enable a mechanism by which the workflow would be responsible for generating this data, which should be provided by Autosubmit/Autosubmit-API in the way that we decide to be available as an endpoint in the API.

For instance, the workflow can provide a YAML file written in a predetermined path (we could make it fixed or have a parameter in the configuration to allow the users to change this path) that the Autosubmit API would check when the endpoint is reached.

+1. I think this feature could be more generalized. @kinow once mentioned something about having user-defined metadata fields in the workflows that can be shown in the API/GUI as well. This metadata could include fields that are values or references (paths) to the content of another file.

We could decide if it's worth having a field in the DDBB to store these additional metrics in bulk by Autosubmit (then it would be Autosubmit, and not the API the one that consumes the workflow file).

I think we must have the metrics, metadata, or source data to calculate them stored in the DDBB. This will allow us to have a historical trace of the metrics per run.

LuiggiTenorioK commented 6 months ago

Issue about the metadata feature: https://earth.bsc.es/gitlab/es/autosubmit-gui/-/issues/99

LuiggiTenorioK commented 6 months ago

In GitLab by @mcastril on May 6, 2024, 11:31

Thank you for the positive feedback Luiggi.

This metadata could include fields that are values or references (paths) to the content of another file.

Concerning this, if the fields are references to actual data or metadata, maybe the field itself should be treated as data (an ordinary parameter) instead of metadata.