Avaiga / taipy

Turns Data and AI algorithms into production-ready web applications in no time.
https://www.taipy.io
Apache License 2.0
11.23k stars 793 forks source link

Possibility to duplicate scenarios #397

Open FlorianJacta opened 1 year ago

FlorianJacta commented 1 year ago

What would that feature address When new scenarios are created, we might also want to skip tasks that are from a 'Scenario' scope if no 'input' Data Nodes have been changed.

Description of the ideal solution When creating multiple scenarios of our problem and executing them, a possibility would be to duplicate them to have the caching features for parameters that will not change.

trgiangdo commented 7 months ago

There is a problem that needs to be clarified for this feature.

If in the original scenario, there is some data nodes that has scope <= SCENARIO, we are going to need to duplicate the data.

Please let me know what you think @FlorianJacta

jrobinAV commented 2 weeks ago

I propose to reformulate the description of this ticket. Please let me know what you think.

What would that feature address When new scenarios are created, we could duplicate some data node's data to initialize the new scenario. The motivation would be not to have to execute the tasks if it is not necessary.

Example: Let's assume the data node B is scenario-scoped. We have one preprocessing task, T1, that is time-consuming. It reads A and writes B. We have a task T2 that reads B and C and writes D. We want to duplicate the scenario, keeping the data of A and B so we don't need to re-compute them. We just want to have a new scenario to vary the C data and recompute an alternative of D.

Scenario 1: A --> T1 --> B ----> T2 --> D C --/

Scenario 2 as a duplication: A' --> T1 --> B' ----> T2 --> D' C' --/

At the scenario 2 creation, we want the following

A.read() == A'.read()
B.read() == B'.read()

Description of the ideal solution Expose a new API to duplicate a scenario providing the list of data nodes data to copy.

FlorianJacta commented 2 weeks ago

The objective of this issue is to implement both a technical and functional feature.

Functional: From the user's perspective, duplicating a scenario is a logical and valuable action. After conducting an extensive analysis and modifying parameters X, Y, and Z, I run my code to observe the outcomes. If I want to see the impact of altering part of Y, I should be able to resubmit without redoing all my previous work. This process mirrors a common and intuitive workflow, akin to a "Save As" function that allows you to save a current scenario or results and then proceed with further analysis.

Technical: We aim to support this workflow while maintaining performance and user-friendliness. With a "Save As" option, it's important that the results aren't lost in the new scenario, and there should be a system to skip redundant operations since this is essentially duplicating a run that's already been completed.

I recognize the potential challenges with SQL read/write operations in this context. I don't have a definitive solution at this moment. Perhaps allowing users to select which data nodes to copy could help, but that might complicate the natural workflow I initially envisioned. Ultimately, this feature is more akin to a "Save As" function than anything else.

jrobinAV commented 2 weeks ago

@FlorianJacta Thanks, that is more clear.