Custom analysis API and InAndOut Implementation

eclipse-tracecompass-incubator / org.eclipse.tracecompass.incubator

Eclipse Public License 2.0

5 stars 14 forks source link

Custom analysis API and InAndOut Implementation #88

Closed bhufmann closed 3 weeks ago

bhufmann commented 2 months ago

Update (Oct 28, 0204): All the PRs below as well as relevant Trace Compass mainline are merged. This PR only has the changes of the InAndOut implementation left.

This PR includes the TSP updates and concrete implementation for customizable analysis. The InAndOutAnalysis has been updated for that. This PR combines multiple commits, where some of them are available as separate PRs. It requires the commits of the following Trace Comapss mainline PR:

https://github.com/eclipse-tracecompass/org.eclipse.tracecompass/pull/164

This PR combines the following commits/PRs:

66
- swagger: Document schema in TmfConfigurationSourceType
- server: Serialize TmfConfigurationSourceType schema json object
70
71
80
InAndOutAnalysis with data provider factory and json string

Additional related changes:

Signed-off-by: Bernd Hufmann bernd.hufmann@ericsson.com

abhinava-ericsson commented 1 month ago

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Or, if we were to support configuration at the experiment level, it would again be a config type, which could be applied to all analysis/ data providers supported by that experiment.

bhufmann commented 1 month ago

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

Or, if we were to support configuration at the experiment level, it would again be a config type, which could be applied to all analysis/ data providers supported by that experiment.

That was my initial design proposal, to allow configuration on the experiment level which can spawn custom data providers based on an input configuration. See here (https://github.com/eclipse-cdt-cloud/theia-trace-extension/blob/e5453d1d1439262657e87a4563cc55582fd95aca/doc/adr/0011-tsp-analysis-api.md#configuration-service-per-experiment). It would support the InAndOutAnalysis use case as well as the per data provider design. Both ways are valid and have pros and cons. I think with the per data provider solution make it harder to implement sharing of configurations, but that was not a requirement anymore at this moment.

I went with the per data provider implementation for InAndOut to have an open-source reference example that downstream projects base their implementation on. It also verifies all the added APIs.

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

abhinava-ericsson commented 1 month ago

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

Club multiple analyses together
Configures each of those analyses

bhufmann commented 1 month ago

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

Club multiple analyses together

Configures each of those analyses

Only one analysis is configured that feeds multiple views, not multiple analyses.

bhufmann commented 1 month ago

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

Club multiple analyses together

Configures each of those analyses

Only one analysis is configured that feeds multiple views, not multiple analyses.

@abhinava-ericsson, @PatrickTasse I realized that for InAndOut analysis the remove method is not conceptionally correct and will be confusing for the end user. What mean is that there is one configuration that spawns multiple data providers (which is a valid use case), but when removing user would have to use of the list of data providers to remove the configuration and hence he rest of the data providers. For that use case we will need to have a separate interface (and endpoint) to remove the configuration which then removes the associated data providers. That would be much cleaner:

ITmfDataProviderSource.remove(ITmfTrace, String configID)

The endpoint could be as followed (or similar)

/experiments/{expUUID}/outputs/{outputId}/configs/{configId}

bhufmann commented 1 month ago

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

Club multiple analyses together

Configures each of those analyses

Only one analysis is configured that feeds multiple views, not multiple analyses.

@abhinava-ericsson, @PatrickTasse I realized that for InAndOut analysis the remove method is not conceptionally correct and will be confusing for the end user. What mean is that there is one configuration that spawns multiple data providers (which is a valid use case), but when removing user would have to use of the list of data providers to remove the configuration and hence he rest of the data providers. For that use case we will need to have a separate interface (and endpoint) to remove the configuration which then removes the associated data providers. That would be much cleaner:
ITmfDataProviderSource.remove(ITmfTrace, String configID)
The endpoint could be as followed (or similar)
/experiments/{expUUID}/outputs/{outputId}/configs/{configId}

@abhinava-ericsson I think there is a solution to make it clearer. @PatrickTasse suggested to return only one data provider descriptor when posting the config. This resulting data provider would be a derived data provider, which may have a graph associated with (e.g. timegraph) or has itself children data providers used for the InAndOut use case. The end user will always use the single derived data provider for removal. In case the derived data provider has children it will also remove the children.

With this we will always have a 1:1 relationship between configuration and derived data provider and the end-user will manage the derived data provider. We don't a separate code path and endpoint anymore as I earlier suggested and it will make things easier.

For the InAndOutAnalysis the hierarchy will look like the following:

--- InAndOut Configurator
    |--- Derived InAndOut (A)
        |--- InAndOut (A) - Flame Chart
        |--- InAndOut (A) - Latency Statistics 
        |--- InAndOut (A) - Latency Table
        |--- InAndOut (A) - Latency vs Time
    |--- Derived InAndOut (B)
        |--- InAndOut (B) - Flame Chart
        |--- InAndOut (B) - Latency Statistics 
        |--- InAndOut (B) - Latency Table
        |--- InAndOut (B) - Latency vs Time

The end-user will use the InAndOut Configurator to create a new InAndOut analysis (e.g.InAndOut (A)) and used Derived InAndOut (A) to remove it. As a nice side effect there will be automatic grouping and which can be visualized in the FE clients.

The API in ITmfDataProviderSource will look like this:

IDataProviderDescriptor create(String typeID, ITmfTrace trace, Map<String, Object> parameters);
void ITmfDataProviderSource.remove(ITmfTrace trace, IDataProviderDescriptor derivedDescriptor);

WDYT?

abhinava-ericsson commented 1 month ago

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

Club multiple analyses together

Configures each of those analyses

Only one analysis is configured that feeds multiple views, not multiple analyses.

@abhinava-ericsson, @PatrickTasse I realized that for InAndOut analysis the remove method is not conceptionally correct and will be confusing for the end user. What mean is that there is one configuration that spawns multiple data providers (which is a valid use case), but when removing user would have to use of the list of data providers to remove the configuration and hence he rest of the data providers. For that use case we will need to have a separate interface (and endpoint) to remove the configuration which then removes the associated data providers. That would be much cleaner:
ITmfDataProviderSource.remove(ITmfTrace, String configID)
The endpoint could be as followed (or similar)
/experiments/{expUUID}/outputs/{outputId}/configs/{configId}
@abhinava-ericsson I think there is a solution to make it clearer. @PatrickTasse suggested to return only one data provider descriptor when posting the config. This resulting data provider would be a derived data provider, which may have a graph associated with (e.g. timegraph) or has itself children data providers used for the InAndOut use case. The end user will always use the single derived data provider for removal. In case the derived data provider has children it will also remove the children.

With this we will always have a 1:1 relationship between configuration and derived data provider and the end-user will manage the derived data provider. We don't a separate code path and endpoint anymore as I earlier suggested and it will make things easier.

For the InAndOutAnalysis the hierarchy will look like the following:
--- InAndOut Configurator
    |--- Derived InAndOut (A)
        |--- InAndOut (A) - Flame Chart
        |--- InAndOut (A) - Latency Statistics 
        |--- InAndOut (A) - Latency Table
        |--- InAndOut (A) - Latency vs Time
    |--- Derived InAndOut (B)
        |--- InAndOut (B) - Flame Chart
        |--- InAndOut (B) - Latency Statistics 
        |--- InAndOut (B) - Latency Table
        |--- InAndOut (B) - Latency vs Time
The end-user will use the InAndOut Configurator to create a new InAndOut analysis (e.g.InAndOut (A)) and used Derived InAndOut (A) to remove it. As a nice side effect there will be automatic grouping and which can be visualized in the FE clients.

The API in ITmfDataProviderSource will look like this:
IDataProviderDescriptor create(String typeID, ITmfTrace trace, Map<String, Object> parameters);
void ITmfDataProviderSource.remove(ITmfTrace trace, IDataProviderDescriptor derivedDescriptor);
WDYT?

Yes. This definitely sounds like a nicer solution.

abhinava-ericsson commented 1 month ago

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

Club multiple analyses together

Configures each of those analyses

Only one analysis is configured that feeds multiple views, not multiple analyses.

@abhinava-ericsson, @PatrickTasse I realized that for InAndOut analysis the remove method is not conceptionally correct and will be confusing for the end user. What mean is that there is one configuration that spawns multiple data providers (which is a valid use case), but when removing user would have to use of the list of data providers to remove the configuration and hence he rest of the data providers. For that use case we will need to have a separate interface (and endpoint) to remove the configuration which then removes the associated data providers. That would be much cleaner:
ITmfDataProviderSource.remove(ITmfTrace, String configID)
The endpoint could be as followed (or similar)
/experiments/{expUUID}/outputs/{outputId}/configs/{configId}
@abhinava-ericsson I think there is a solution to make it clearer. @PatrickTasse suggested to return only one data provider descriptor when posting the config. This resulting data provider would be a derived data provider, which may have a graph associated with (e.g. timegraph) or has itself children data providers used for the InAndOut use case. The end user will always use the single derived data provider for removal. In case the derived data provider has children it will also remove the children. With this we will always have a 1:1 relationship between configuration and derived data provider and the end-user will manage the derived data provider. We don't a separate code path and endpoint anymore as I earlier suggested and it will make things easier. For the InAndOutAnalysis the hierarchy will look like the following:
--- InAndOut Configurator
    |--- Derived InAndOut (A)
        |--- InAndOut (A) - Flame Chart
        |--- InAndOut (A) - Latency Statistics 
        |--- InAndOut (A) - Latency Table
        |--- InAndOut (A) - Latency vs Time
    |--- Derived InAndOut (B)
        |--- InAndOut (B) - Flame Chart
        |--- InAndOut (B) - Latency Statistics 
        |--- InAndOut (B) - Latency Table
        |--- InAndOut (B) - Latency vs Time
The end-user will use the InAndOut Configurator to create a new InAndOut analysis (e.g.InAndOut (A)) and used Derived InAndOut (A) to remove it. As a nice side effect there will be automatic grouping and which can be visualized in the FE clients. The API in ITmfDataProviderSource will look like this:
IDataProviderDescriptor create(String typeID, ITmfTrace trace, Map<String, Object> parameters);
void ITmfDataProviderSource.remove(ITmfTrace trace, IDataProviderDescriptor derivedDescriptor);
WDYT?
Yes. This definitely sounds like a nicer solution.

One thing I would say here is that in effect, we are introducing a notion of sub-dataproviders, which is slightly different than derived data providers. We may need to handle it accordingly.

eclipse-tracecompass-incubator / org.eclipse.tracecompass.incubator

Custom analysis API and InAndOut Implementation #88

66

70

71

80

97

98

100

103

110