dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Document data schema for WMCore RESTful services #11315

Open amaltaro opened 2 years ago

amaltaro commented 2 years ago

Impact of the new feature WMCore in general

Is your feature request related to a problem? Please describe. In order to evaluate possible improvements in how data gets stored and served by WMCore services, we should first try to understand their data schema, data format, data structure and maybe even identify data that is not relevant (especially relevant to WMStats / T0WMstats).

Describe the solution you'd like The outcome of this GH issue is meant to be a document describing the data schema provided by RESTful APIs for the following services:

If we have any data caching - as we do for WMStats - then we should document this as well.

Describe alternatives you've considered None

Additional context None

amaltaro commented 2 years ago

@vkuznet from our chat in the previous week, I understood you will be working on this project during the DMWM Hackathon. Please feel free to update the issue description as well, in case I missed anything.

vkuznet commented 2 years ago

@amaltaro , I'm not sure how I can start working on issue where I have little knowledge and need input from expert (you). I don't really know which pieces of data are used in different places, I don't know how data originally created and propagated, etc. Said that, it is good that you created an issue for that but my assignment so far has little sense until expert will provide relevant input. Therefore, in order to move with this issue I need the following information from the expert:

As a starting point I think the following "diagram" will be useful:

WMCore (Spec?) creates a dict, e.g. {... some-attributes...} -> it is passed to ms-unmerged service which does the following (assign ABC, etc.) -> pass to ms-XXX -> ...

Once expert outline the data flow and provide initial data origin we can start working on this issue. Moreover, if we have different workflows for different processes, e.g. MC vs ReReco, then we need to outline all different schema and all different steps (mentioned above).

Finally, we should target the services, like WMStats which stores such unstructured data and see whole picture how this data is used across the system and verify why do we need such structures.

vkuznet commented 1 year ago

@amaltaro I asked several questions and so far did not get any answers from you. Please review my previous comment and provide necessary information (pending since Oct 3rd)

amaltaro commented 1 year ago

@vkuznet Valentin, I think a data flow diagram can be created to highlight the services interaction and workflow path in the system. However, specifying attributes and data schema in WMCore isn't tangible IMO. If you want to dive in in data schema, we need to work on this in specific services (and maybe even in specific APIs).

While I try to allocate time for creating this data flow diagram, I invite you to look into the workflow types (spec types) that are supported in ReqMgr2 and how they are constructed (their inheritance, mandatory/optional attributes - which might be missing GPU parameters by the way): https://github.com/dmwm/WMCore/wiki/Workflow-creation-and-assignment-definition

Note that StoreResults is no longer supported, even though it's listed in the wiki and the code hasn't been removed from WMCore.