Avaiga / taipy

Turns Data and AI algorithms into production-ready web applications in no time.
https://www.taipy.io
Apache License 2.0
10.94k stars 775 forks source link

Define input, ouput, intermediate data nodes #1363

Open FlorianJacta opened 3 months ago

FlorianJacta commented 3 months ago

Description

The goal of this issue is to discuss what does input, ouput, intermediate data nodes mean.

Solution Proposed

To my mind, the concept of input, output and intermediate data nodes are relative to the DAG.

In my opinion, <data node>.is_input doesn't have a meaning for example by itself.

This concept should be attached to the objects representing a DAG:

In other terms:

I think the inputs/outputs of interest for the Data Node Selector are the ones relative to the whole Config.

Impact of Solution

No response

Additional Context

No response

Acceptance Criteria

Code of Conduct

trgiangdo commented 2 months ago

For the global Config:

For <Scenario>.inputs, <Scenario>.outputs, <Sequence>.inputs, and <Sequence>.outputs, we already have similar APIs in the Submittable class. We can expose those if needed.

FlorianJacta commented 2 months ago

This seems right to me!

jrobinAV commented 2 months ago

The APIs you mentioned @trgiangdo are contextual, meaning they are Config or Submittable APIs. So, we can interpret the API as follows:

The question is slightly different, though. It concerns the default context when there is no explicit one. How can we answer the question, "Is this data node an input?" independently from any context? @FlorianJacta proposes using Config as the default context, but I am not sure it is intuitive enough. Moreover, the question has been raised in the data node selector filter, which is exposed to the end user. There is a high probability the end user does not know anything about the config DAGs.

Let's take a complex example.

from datetime import datetime
from taipy import Config, Core, Frequency, Scope, create_scenario

def identity(value):
    return value

d1 = Config.configure_data_node("d1", scope=Scope.GLOBAL)
d2 = Config.configure_data_node("d2", scope=Scope.CYCLE)
d3 = Config.configure_data_node("d3", scope=Scope.SCENARIO)
d4 = Config.configure_data_node("d4", scope=Scope.SCENARIO)

t1 = Config.configure_task("t1", function=identity, input=[d1], output=[d2])
t2 = Config.configure_task("t2", function=identity, input=[d2], output=[d3])

t3 = Config.configure_task("t3", function=identity, input=[d1, d2, d3], output=[d4])

s1 = Config.configure_scenario("s1", task_configs=[t1, t2],
                               sequences={"seq1": [t1], "seq2": [t2]},
                               frequency=Frequency.DAILY)
s2 = Config.configure_scenario("s2", task_configs=[t3], frequency=Frequency.DAILY)

Core().run()
scenario_1 = create_scenario(s1, datetime(2021, 1, 1))
scenario_2 = create_scenario(s1, datetime(2021, 1, 2))
scenario_3 = create_scenario(s2, datetime(2021, 1, 1))
scenario_4 = create_scenario(s2, datetime(2021, 1, 2))

The piece of code instantiates the following data nodes: One global scoped dn: d1 Two cycle scoped dns: scenario_1.d2, scenario_2.d2 Six scenario scoped dns: scenario_1.d3, scenario_2.d3, scenario_3.d3, scenario_4.d3, scenario_3.d4, scenario_4.d4

What are the inputs, the outputs, and the intermediate data nodes? As an end-user, I really don't know what I am expecting as an answer.

FlorianJacta commented 2 months ago

I need clarification on what is confusing about this. Why is the definition above not the expected definition?

jrobinAV commented 2 months ago

As an end user, listing all input data nodes is not self-explanatory. I need to well understand the whole config with all the scenario configs, all the sequences, etc. to understand what I am going to get.

Let's imagine I have a role that only allows me to view scenarios from the second scenarios config s2. So, I am expecting to get [d1, scenario_1.d2, scenario_2.d2, scenario_3.d3, scenario_4.d3] as a result when asking for inputs. Your proposal will only return [d1].

trgiangdo commented 2 months ago

Do we have a role system that can explicitly set the access role of a user to some specific scenarios? I did not know that.

Anyway, from the example that you declare: Config.inputs = [d1] Config.outputs = [d4] When we call Config..., the list will be a list of data node configuration.

For the scenario entities: scenario_1.inputs = [scenario_1.d1] scenario_1.outputs = [scenario_1.d3] scenario_1.seq_1.inputs = [scenario_1.d1] scenario_1.seq_1.outputs = [scenario_1.d2] scenario_1.seq_2.inputs = [scenario_1.d2] scenario_1.seq_2.outputs = [scenario_1.d3] scenario_2 is the same as scenario_1

scenario_3.inputs = [scenario_3.d1, scenario_3.d2, scenario_3.d3] scenario_3.outputs = [scenario_3.d4] scenario_4 is the same as scenario_3

The scope of the data node doesn't affect the outcome of these APIs I think

jrobinAV commented 2 months ago

@trgiangdo I was not specifically talking about Taipy enterprise roles. My example was confusing. Let me rephrase the sentence. 'Let's imagine I have a user interface on which I only view scenarios from the second scenarios config s2.'

What would be the result of tp.get_inputs(), without any explicit context? Or in other words, what would be the result of scenario_1.d2.is_input()? In such use case, I am expecting as an answer : tp.get_inputs() == [d1, scenario_1.d2, scenario_2.d2, scenario_3.d3, scenario_4.d3] scenario_1.d2.is_input() == True Both will be false with Florian proposal.

FlorianJacta commented 2 months ago

In my opinion, .is_input doesn't have a meaning for example by itself.

This is what I wrote in the issue.

tp.get_inputs() doesn't mean anything to me

A Data Node is input/output depending on the context.

trgiangdo commented 2 months ago

I don't think tp.get_inputs() or <DataNode>.is_input() are possible at all.

Me and Florian agree on the 6 APIs: Config.inputs, Config.outputs, <Scenario>.inputs, <Scenario>.outputs, <Sequence>.inputs, and <Sequence>.outputs, I think.

For the scenario_1.d2.is_input() == True, it is correct right? Since we are looking at the data node at scenario context. But I don't see how we can implement it, because it need to know which scenario is calling to it as well, so .is_input() is not possible and make no sense.

jrobinAV commented 2 months ago

Are you saying I should better read you description? 🤣 If so, I believe you are right...

I misunderstood your proposal. Sorry.

trgiangdo commented 2 months ago

So do we agree on the requirements now?

jrobinAV commented 2 months ago

After a better reading, I now understand the proposal. I am okay with the concepts exposed in the Taipy core package. But I believe it does not answer the issue, in particular on the sentence from the description that is, in the end, the root motivation of the issue:

"I think the inputs/outputs of interest for the Data Node Selector are relative to the whole Config."

I strongly believe, we don't want to expose the config inputs and outputs in the data node selector. The config is a developer concept, not an end-user concept. The end-user will not easily understand the input and output data nodes. What is needed in the data node selector is another concept that sometimes (mostly in demos) overlaps with the developer input-output data node concept. My understanding is that the end-user wants to access two kinds of data nodes quickly:

jrobinAV commented 1 week ago

A tradeoff has been proposed. The idea is to display in the data node selector the data nodes with a scenario scope in topological order. With this proposal, the scenario is the context used to set the data node rank.

A more formal proposal should come soon.