Best way to handle components selection/removal in a project

fred-r commented 2 years ago

Requirement we want to address

As an end-user, I want the tool to ease the project composition and maintenance, so that I do not waste lots of time reviewing my components list.

Separation of concerns I believe that:

PDSC is here to express the requirements of a component, but not how these dependencies should be resolved by the tool (human interaction or automatic)
csolution is here to parse the csolution/cproject.yml file and resolve conditions, it can offer a service to add/remove elements in csolution/cproject
UI is here to reflect a "flow" for the end-user to handle the csolution/cproject.yml

Trivial Resolution We need to define the concept of "trivial resolution". In the PDSC we can only say that component A requires or accepts something to sustain its needs. So, depending on the packs installed on the user's machine, there can be 1 or several options to resolve a condition. Hence, "trivial resolution" concerns only conditions where only 1 component can match the conditions. [Cvendor::] Cclass [&Cbundle] :Cgroup [:Csub] [&Cvariant] [@[~ | >=]Cversion] must be defined in a way that 1 and only 1 element can match the condition.

NOTE: in the PDSC we can write a very precise condition using all these fields. But, I guess the spirit of the PDSC is to at least not specify the exact version. If so, today I guess only the latest revision is proposed but we may imagine that a user may want to select explicitly a previous revision of a component. To be discussed.

Automatic Selection/Removal These concepts mean that the tool resolving the conditions can make automatic decisions on behalf of the end-user when it comes to "trivial resolution":

if 1 and only 1 component matches a condition, then it can be added automatically to the project
if a component listed in the project:
- is not required by any active condition in the project
- has not been added explicitly by the end-user but automatically due to a "trivial resolution" then this component may be automatically removed from the project.

This automatic selection/removal:

has impact on the UI : what do we present to the end-user or not ?
has impact on the tool managing the YML files: what do we reflect or not in the csolution/cproject.yml files ?

We can talk about: automatic management of a component

Possibilities

A first approach could be to say that csolution/cproject.yml files contain only "non-trivially resolved components" so components explicitly added to the project by the end-user:

either by the user's "free choice"
or by the user's choice during a "non trivial resolution" of condition In this case, all "automatically selected" components would go in the lock file only.

Consequences:

only the lock file reflects the actual project to be built
csolution/cproject.yml contains only components explicitly selected by the end-user

Another approach could be to add an "auto" attribute to the components listed in csolution/cproject.yml files. This field would have only 2 possible values:

true: component added automatically by the "tool" (csolution or condition resolver if different)
false: component added explicitly by the end-user

Project grooming When it comes to project maintenance:

explicitly selected components can be removed from the project by the end-user only
automatically selected components can be removed by the "tool" without any user interaction (notifiation may be done but I do not think it is required to specify it)

Scope I think it is clear that the concept of "trivial resolution" (and therefore of automatically managed component) cannot be determined at PDSC level as we do not have any information on other packs available when dealing with the project. Therefore, this concept is to be defined at tool level:

concepts must be defined (automatically managed component)
csolution behavior can be specified
UI behavior may be left to the tools' provider choice

fred-r commented 2 years ago

@jkrech @ReinhardKeil @slhultgren @mdortel-stm @tcsunhao @davidjurajdanxp @edriouk @brondani

jkrech commented 2 years ago

@fred-r thanks a lot for raising this issue under devtools and the excellent summary. I just wanted to add that a dependency condition can be specified such that the pack vendor has 'full' control over the resolution once a "fully specified" componentID is used <require Cvendor="ARM" Cclass="..." ... />

fred-r commented 2 years ago

@fred-r thanks a lot for raising this issue under devtools and the excellent summary. I just wanted to add that a dependency condition can be specified such that the pack vendor has 'full' control over the resolution once a "fully specified" componentID is used <require Cvendor="ARM" Cclass="..." ... />

Agree, I will clarify it in the summary.

jkrech commented 2 years ago

What if we start to separate 3 types of selected components:

Primary components
- user selected
- required by one or more source modules <- #include ".h"
- interface: specifies component interface as provides
Secondary components
- user selected (e.g. to pick a certain version, or choosing one of multiple components resolving a dependency)
- are required by "primary components" but not directly by the source modules -> no #includes
Automatic components
- components selected by a resolution algorithm

DavidJurajdaNXP commented 2 years ago

@fred-r I agree with concept of 'trivial resolution' and also with need of differentiation between components added as result of user action and automatically added components by resolver.

@jkrech I am not sure about difference between "Primary components" and "Secondary components". Why we need such differentiation? Is not it still just "User selected component". From my perspective it does not meter whether used make selection on its own or with assistance of tool.

jkrech commented 2 years ago

One challenge I see today is that in a project we have files (header and modules without meta information) and components (with meta information e.g. about dependencies). Once a module explicitly includes a header file from a component, there is a dependency that we do not track today. Identifying components that are no longer needed, in my view requires this type information.

madchutney commented 2 years ago

Thanks @fred-r your description, it highlights some of the challenges we are facing in building an IDE to help with component management. How to select components, display and add dependencies is something that we have a good handle on, but there doesn't seem to be a good way to automatically manage the removal of components.

Part of this has already been mentioned, but so far I have identified three areas where we are missing information in the current format:

A distinction between "required" components and "dependency" components.
A link between any included component and its dependencies.
A link between any included component and the included pack.

If these three things were available, when a "required" component is removed, it will be possible to safely clean up the component list, removing all unused dependencies and packs.

This has been briefly discussed while considering the format of a lock file and I suggested a possible schema to address these issues https://github.com/Open-CMSIS-Pack/devtools/issues/119#issuecomment-1180261244. However, the actual data model used is not important, but rather that the relationships are captured and maintained automatically.

One last point, it is essential that any component management tool does not need to look into the project code to determine which components should or should not be part of the project.

fred-r commented 2 years ago

Seeing the comments, I think that we all agree there is a need to address this project composition user experience.

My feeling is that at the moment the pdsc focuses on build dependencies only, not functional dependencies. So, I assume we should stick to this dimension for the time being.

With respect to this, I understand there are additional points from @madchutney:

clean-up the unnecessary components but also the packs
do so without analyzing the project code

Point #2 is an objective to me, but prevents to cover the use-case from @jkrech : "out of component code" added in the project. Here, we enter a tricky area and I see only two options (but there may be more):

We clearly state that at composition stage we deal only with components (so we have metadata)
Or we want to address uncomponentized code added as "files" (or am I missing something about the module concept?).

If we want to address (2), then I guess there are only 2 options: 2A. Parsing the code to extend the metadata with required infomation : but when to parse ? As soon as such a file is modified ? That sounds tough... 2B. Ask the user to create a condition when in his code he introduces a dependency on a component

I am more in favor of 2B because basically we do not force the customer to create components he does not need, but:

he is responsible for describing the dependency
it can be an "introduction" to the component concept, leading him to consider creating components :-)

This would mean that in the project definition we would have:

conditions coming from components definition (PDSC or GPDSC)
"floating conditions" not coming from a component but from "uncomponentized code" (but as we need to memorize their source, I guess we need to create shadow components...gpdsc ?)

That's extra complexity...

Regarding the clean-up of the packs : to me this does not really impact project composition but more packs management. It is a way to optimize disk space if we want to "pack" a standalone project, so it can be interesting indeed.

madchutney commented 2 years ago

Regarding the clean-up of the packs : to me this does not really impact project composition but more packs management. It is a way to optimize disk space if we want to "pack" a standalone project, so it can be interesting indeed.

I think generally this is the case. However, there is a nuance/edge case where this could impact project composition. When resolving component references, the packs listed in the solution can be used to filter the available resolution options. The more packs in the solution the increased chance that the resolution would match multiple components.

jkrech commented 2 years ago

I disagree. We have agreed that csolution would need to ask for user input, if more than one component resolves a dependency - effectively the component ID specified in the yml file needs to be refined to become unique. Once the dependencies are resolved and components got selected the "used" pack versions are known and the remaining packs could be removed. However it is required to take all project contexts of the solution into consideration. What am I missing?

madchutney commented 1 year ago

@jkrech I understood it was possible for a component ID to resolve to components in different packs, even when fully specified as it does not include any reference to a pack (ComponentType)?

If this is not the case then please disregard.

If I have understood correctly, then removing a pack (from the solution definition) could mean that a component could be resolved, while it couldn't if the pack which also contained the component was left in the list. I don't think the clean up can be done automatically in this case.

jkrech commented 1 year ago

@madchutney while it is technically possible even for a fully specified component (Cvendor is set) to be resolved to components in different packs, it is the vendor's responsibility to prevent this from happening. If there is strong reason for doing so, it is the vendor's responsibility to ensure that the description and content of these components are identical such that it would become irrelevant from which pack the component is actually taken. I think the worst case scenario would be that the tool would keep both packs in the list. Therefore I believe it is safe to disregard this scenario.

edriouk commented 1 year ago

I think it is hard to distinguish user-assisted and automated component selection for dependency resolution. An automatically selected component can still be changed by user, e.g. another variant or version.

My proposal is instead of marking components as auto, allow the users to define primary components they believe are essential for the project/solution/layer. The corresponding property should be stored in csolution/cproject/clayer files.

Then we can introduce a “Clean Dependencies” command that will remove redundant components, but only those that are not marked as primary.

jkrech commented 1 year ago

During our dedicated session for this topic on Friday 2022-10-28 we concluded that there is three types of selected components we see selected by: a) user b) trivial dependency c) user choice due to dependency

In order to support implementations without "automatic dependency resolution" it remains mandatory to explicitly list all three types of components in .cproject/.clayer.yml. Implementations supporting "automatic dependency resolution" may suggest to the user to remove components of type b) and c) in case the dependency check does not list them as "required", due to other changes in the solution.

components:

components-choice:

components-auto:

Regarding automatically updating the list of packs used/required by the solution, we concluded that we are not missing any meta information which would link components to their origin packs.

fred-r commented 1 year ago

Agree with this refinement of the initial statement. Please note that (b) exists only in the context of a given set of packs. As soon as packs are added/removed/updated, then trivial dependencies might become non-trivial and require a user-choice.

To me, this means that as soon as the packs perimeter evolves, then all conditions need to be recomputed. I guess this was already clear, but just wanted to insist on it :-)

fred-r commented 1 year ago

Regarding the "memorization" of (a), (b) or (c), I think it is worthwhile to maintain a single and "complete" source of truth.

So I am not in favor of solutions like:

components (a) and (c) in cproject.yml
components (b) only in lock file

Rationale: one of the reason for choosing YAML is the human-readability. Therefore, we should not hinder the possibility for a developer to find his complete set of components easily.

I am more in favor of adding a kind of "injection-type" or "origin" attribute to the component element in the YML file. So a new attribute in: https://github.com/Open-CMSIS-Pack/devtools/blob/main/tools/projmgr/docs/Manual/YML-Input-Format.md#components

Rationale:

it is human-readable (same spirit as YML choice)
it allows easily debugging the condition reevaluation bugs (because I doubt we will implement it right from day 1... :-) )
this can be extended to files because at the end of the day, adding files can follow the same reasoning
- file added because the end-user really wanted this file
- file added to resolve a component dependency but without relying on components : here it can be trivial or not

edriouk commented 1 year ago

I think all components need to be listed in cproject.yml. In most cases a component need to be configured, i.e. config files to be copied into project and edited by the user. Thus the user need to know the originating component regardless if it automatically selected or not. Therefore I would suggest using only one flag: "user selected" with the meaning that such component should stay selected even if resolves no dependency.

mdortel-stm commented 1 year ago

To me, this means that as soon as the packs perimeter evolves, then all conditions need to be recomputed. I guess this was already clear, but just wanted to insist on it :-)

From a user point of view, I'm afraid it will lead to some questions he'll possibly have to answer each time he opens a project after having installed some new packs. In other words, it looks like a project becomes "user environment dependent", what I'm not fond of. In the end, to serve the original need, what we need to know is "does a component resolve a condition". Nevertheless, don't forget that there will always be a kind of component which will never resolve a condition: an application. I know it has never been covered by CMSIS, but ST uses a lot that kind of component.

fred-r commented 1 year ago

@mdortel-stm : agree with you, we should probably not fall in this pitfall :-)

But, here I see it more an implementation issue : provide the way to "lock" the project environment.

Otherwise, as an end-user, I may be interested in:

upgrading my packs
reopen my project to do a kind of "sanity check" : see if some new way to resolve conditions arise with maybe different components ?
maybe I know exactly what I want, maybe not
maybe I also just want to "update" my project automatically (and revalidate it)

Also, I may have resolved a dependency with a specific set of files (dev mode). Then I created a pack for delivering this artefact. Now, I want my projects to use the pack instead of the "orphan" files.

So, as usual, let's avoid over-engineering, I agree. But, we should not hinder the possibility to guide the user by providing "help" to maintain the projects.

jkrech commented 7 months ago

The one thing that has changed in the meanwhile is the introduction of the lock file: <solution-name>.cbuild-pack.yml. This way the loaded pack versions do not change after installing a new pack version in $CMSIS_PACK_ROOT.

Summary:

csolution should be able to identify any pack listed in solution which is not referenced anywhere in any <context>.cbuild.yml and report it as unused.

We agree on the definition of "Trivial dependency resolution" and "User selected dependency resolution". We agree on listing components that were added as a result of "User selected dependency resolution" under a separate node: e.g. usdr_components: csolution can find out whether any of these components are no longer required to resolve a dependency.

We have not agreed whether components selected by the trivial dependency resolution should be

added in cproject/clayer
added in a separate file
always be computed at runtime but not stored in Input YAML -> but visible in <context>.cbuild.yml

slhultgren commented 7 months ago

In terms of backwards compatibility, I would propose these answers:

Added in cproject/clayer => Yes
Added in separate file => No, unless that file is a clayer
Always be computed at runtime but not stored in input YAML => No, even if it would be nice, it would break backwards compatibility at this point

Also, implementation in cproject/clayer can be done in two primary ways:

Variant1: extra property on component

project:
components:
- component: Vendor::Cclass:Cgroup1
  reason: top # If not specified, reason is "top" by default
- component: Vendor::Cclass:Cgroup2
  reason: choice # Due to some top level component, I've been forced to make a choice to solve ambiguity
- component: Vendor::Cclass:Cgroup3
  reason: auto # Trivially added dependency

Variant2: different component groups

project:
components:
- component: Vendor::Cclass:Cgroup1
components-choice
- component: Vendor::Cclass:Cgroup2
components-auto:
- component: Vendor::Cclass:Cgroup3

The benefit of Variant1 in this case is that it should be fully backwards compatible, since the reason property could just be ignored by existing/older tools. The drawback of Variant1 is that it is a bit noisier, so cleaning components manually or with a tool might look riskier since it is modifying in the same list as "top level"

Still, variant1 is the less breaking of the two.

ReinhardKeil commented 7 months ago

I believe this should be supported somewhat by the compiler. Compilers identify already when functions are unused. My proposal is to park this request for now and revisit it once we explore static code analysis.

jkrech commented 7 months ago

Sorry, I did not mean to reopen the discussion here but wanted to get confirmation on having captured the latest state. I will close this issue and have opened https://github.com/Open-CMSIS-Pack/devtools/issues/1359 instead for future review.

Open-CMSIS-Pack / devtools

Best way to handle components selection/removal in a project #466