API interactivity - Githubissues

LuiggiTenorioK commented 10 months ago

Summarizing the discussion about the goal of this year, we are aiming to add more interactive endpoints to the API that will allow users to trigger actions that can modify the state of the experiments. To accomplish this there are some issues we have to solve:

Define the scope

We need to list requirements to better structure the changes we want to make. In this particular case, we can list the actions (run experiment, update description, change status, etc) we desire to include inside the API. Then, we can make a formal endpoint definition in OpenAPI with the route, expected request, and response.

Also, this will help us to link the effort we have to do in other tasks (DDBB sync, security, communication with Autosubmit, ...)

[UPDATE] Mapped actions until now:

Start Experiment POST /v4/experiments/<expid>?action=run -> start
Stop Experiment POST /v4/experiments/<expid>?action=stop -> stop
Set Status to Job PATCH /v4/experiments/<expid>/jobs/<jobid>?status=<newstatus> -> setstatus
Create experiment POST /v4/experiments -> expid
Generate the experiment POST /v4/experiments/<expid>?action=generate -> create
Restart the experiment POST /v4/experiments/<expid>?action=restart -> recovery

Set some infrastructure cases

There are multiple scenarios in which the API will be installed like in ES, Climate DT, and EDITO. Is important that we formally define those to better understand the bounds (security, network, dependencies) we are going to have for each one.

Define the action procedures

As discussed, there are some options to process the actions we want to include inside the API:

Modify the artifacts (DDBB and files) directly: This option is great for actions that update the metadata but will require a better understanding of the data sources as discussed in #53 to unify definitions and responsibilities. Might be hard to ensure retro compatibility as different versions of Autosubmit might use different data sources.
Call the Autosubmit commands: This can be used for actions that need to execute extensive procedures through a subprocess. Also, it will not add a dependency directly to the API. This allows the possibility to run the commands even if the API and Autosubmit are in different nodes but we will need to figure out how to ensure a safe communication between them. There is some discussion about it in autosubmitreact#21.
Use Autosubmit as a Python package: In this case, Autosubmit can be called directly through the code and can be more easily handled. Still, it adds a hard requirement that Autosubmit should be a dependency of the API. In both, the previous and this option, retrocompatibility is delegated to Autosubmit.

@mcastril @kinow

LuiggiTenorioK commented 10 months ago

changed due date to June 30, 2024

LuiggiTenorioK commented 10 months ago

In GitLab by @mcastril on Jan 5, 2024, 19:42

Thank you for the documentation Luiggi. Regarding the infrastructure cases and action procedures, we have to keep the portability and interoperability of Autosubmit and its API. Anyway, ES environment, EDITO Infra or Climate DT one are different enough to provide some general enough specifications for a system that must be compliant with the three environments.

LuiggiTenorioK commented 10 months ago

In GitLab by @kinow on Jan 18, 2024, 10:23

From today's meeting:

Check with Surf GUI developers what are the requirements for their model builder GUI
Based on those requirements, we can try to craft the minimal viable product & reqs for AS API Interactivity for EDITO
After that we can revisit this issue and think if we need to improve/how/when/etc
We might use this to also check in the docs or with the Surf GUI devs if there are any other reqs for Autosubmit GUI

LuiggiTenorioK commented 10 months ago

In GitLab by @mcastril on Jan 18, 2024, 17:17

I agree with the plan for EDITO. In a broad way, these are the summarized requirements.

Core actions to trigger with the endpoints
- run, setstatus, stop (there is an issue in Autosubmit about the implementation of a new command)
Other secondary actions
- create, expid, recovery

setstatus can be triggered with a file modification: https://autosubmit.readthedocs.io/en/master/userguide/manage/index.html#how-to-change-the-job-status-without-stopping-autosubmit

For the stop and the recovery, we could use the same approach if we implement the same behavior in Autosubmit (by using files).

For run, create, or expid it's more tricky as Autosubmit is not running to look for and consume the file.

One alternative is to deploy a daemon looking for these files and spawning an Autosubmit process but we could end up in the same issue that the API has to start a process under the user's identity.

LuiggiTenorioK commented 10 months ago

In GitLab by @kinow on Jan 22, 2024, 10:37

mentioned in issue autosubmitreact#90

LuiggiTenorioK commented 10 months ago

Going back to this. I opened an issue (#58) with a design that could handle the run and stop operations without deploying a daemon by having a higher-level API that maps the nodes executing those processes.

But, in the design, I assume that the current API will call the Autosubmit command autosubmit run using its latest version and opening an independent process. This is something that wasn't done before as Autosubmit wasn't necessarily installed in the same node of the API and they were connected just by the file system.

@kinow @dbeltrankyl I wanted to ask if this is a feasible strategy or if I'm missing a potential issue by calling Autosubmit CLI commands from the API environment.

LuiggiTenorioK commented 10 months ago

In GitLab by @kinow on Jan 23, 2024, 17:47

I am not sure if that would work well. There are potential issues with the Autosubmit version. e.g. we changed the pickle or configuration parsing, and know that an experiment needs adjusting before it can be used with the latest version. Now we need to know what is the version of Autosubmit to use to launch it. I think we will really need a few sessions on the whiteboard to discuss possible scenarios, like user deleted/left the company, experiment was archived (maybe we still want to show in the UI and unarchive?), how/if it will handle restarting experiments of others, etc.

After using the whiteboard it should be more clear (at least for me) what are the limitations, and how this should work.

LuiggiTenorioK commented 9 months ago

In GitLab by @mcastril on Jan 31, 2024, 18:56

You are right that there are many aspects to consider.

Maybe we can separate the problem in two parts: the "interactive" endpoints and synchronizing remote environments. The second is interesting for many reasons, not only this one but also our medium-term goal to synchronize workflows running in independent environments and set dependencies between their tasks.

The daemon issue for me is independent of the higher-level API. That was a way to allow interaction with AS by just writing files. If the API can call Autosubmit commands then the daemon is not needed anyway, but this is independent from the synchronization IMO.

Regarding the Autosubmit version, at least we store this value in the DDBB and in the config, and Autosubmit alerts the user when they intend to run an experiment with a different version. We can port the feature to the GUI/API and then directly run the experiment with -v in case the user approves the version change.

LuiggiTenorioK commented 9 months ago

Right! There are different problems to solve. Adding the interactive endpoints is a must for sure. On the other hand, we have to find a way to handle experiments of different versions in different environments.

For the different versions issue, I think Miguel is right that we can use the -v flag to solve it.

For the different environment issues, there are the daemon synchronization and the higher-level API solutions. IMO the synchronization solution is way more complex, especially considering the different versions of the experiments. Then, having a higher-level API will work better for EDITO as it will only need one service from where the requests will be made (SURF).

This higher-level API solution is inspired by another project used in the most popular workflow manager in bioinformatics (https://galaxyproject.org/) which uses a similar lower-level API called Pulsar to solve the same problem we have.

LuiggiTenorioK commented 9 months ago

Our problem is more or less stated here: https://pulsar.readthedocs.io/en/latest/containers.html

(I remember stating similar issues in my Master's thesis)

LuiggiTenorioK commented 9 months ago

From today's meeting:

20240201_150049

LuiggiTenorioK commented 9 months ago

In GitLab by @kinow on Feb 2, 2024, 10:38

Thank you for attaching it here, @LuiggiTenorioK !

@mcastril , should one of us get in touch with EDITO/SURF to schedule a meeting to discuss about this? If so, maybe it would be Quentin/Renaud and Francesco from CMCC via email explaining what we want to discuss and asking for best time/day for the meeting? Thank you

LuiggiTenorioK commented 9 months ago

In GitLab by @mcastril on Feb 7, 2024, 12:33

Yes Bruno, thanks for volunteering. Please refer to Renaud, Quentin and Francesco together, with us in copy.

LuiggiTenorioK commented 9 months ago

In GitLab by @kinow on Feb 13, 2024, 11:57

Meeting doodle poll sent.

Summarizing what we discussed in the meeting.

Are we going to have a single API instance, shared by all the EDITO-Infra users, or are we going to have one API per user? Or both?
If we have instances/containers for the API shared by users, which user could we use to run the API? Would it have access to S3 and to an SSH key to connect to HPC or other EDITO-Infra instances?
How are the API and GUI instances going to be started? By a user action, or Kubernetes will keep a minimum number of pod(s) running?
We need to define how/if we will use a shared file-system for Autosubmit experiments. For the demo we used S3, but that's not really an option for Autosubmit (we use NFS at ClimateDT and BSC). Maybe we could allocate a permanent volume to be bound to each Autosubmit container (with enough storage for the experiments? GB's, TB's?)
Do we have a list of requirements, or how SURF GUI will interact with the API? This would be useful to validate the endpoints we will have to implement.

LuiggiTenorioK commented 9 months ago

In GitLab by @kinow on Feb 29, 2024, 11:26

Do we have a list of requirements, or how SURF GUI will interact with the API? This would be useful to validate the endpoints we will have to implement.

Use the API to request the status of the experiment that is running. We can build a list of experiments the user is running, and show how long the experiment is taking, resources used...

When the user defines what they want to submit, at that point they submit the list of tasks and the run of the job starts.

Users must be able to restart from a certain point in the workflow (setstatus).

Q: Can we get the list of N last experiments?

Yes, but the deployment option could define how it works.

Possible endpoints required:

list experiments per user
- get experiment details (if not returned in the previous call)
create experiments (from scratch or from a template)
launch an experiment
restart an experiment
stop experiment

LuiggiTenorioK commented 9 months ago

In GitLab by @kinow on Feb 29, 2024, 11:52

Are we going to have a single API instance, shared by all the EDITO-Infra users, or are we going to have one API per user? Or both?

Both ways are possible, but it's on the business side to choose.

Surf GUI
Surf API (?)
Autosubmit API - python process
Autosubmit GUI - HTML httpd
Autosubmit - python process, reading a shared DB

We need to choose between Process and Service (in EDITO Infra).

In a Service you can launch multiple tools (GUI + AS API + Surf API + etc).

Not tested, but it should be possible to have dependencies between Services.

In Datalab we have "Projects". We could create the project "Edito ModelLab". You can create an instance with members of the project. You can also share the URL of the project with members outside the project.

If we have instances/containers for the API shared by users, which user could we use to run the API? Would it have access to S3 and to an SSH key to connect to HPC or other EDITO-Infra instances?

This needs to be tested to confirm. It should be possible to share the instance so others can manage it too.

How are the API and GUI instances going to be started? By a user action, or Kubernetes will keep a minimum number of pod(s) running?

In the project we can have services that are always available. At the moment +2 weeks old services are killed, but this may change in the future.

Q: Who maintains the infra (if a service goes down?)

Suggestion: use replication (pods/etc) to have more resources, configure the helm/etc to have higher availability.

Q: Surf GUI can use the API

LuiggiTenorioK commented 9 months ago

In GitLab by @kinow on Feb 29, 2024, 12:00

Q: Can I deploy to another service/catalogue/env?

At the moment merge requests go to production. Staging is for EDITO-Infra. EDITO-Infra team is working to give others access to the playground catalogue/env.

N.B.: BSC team to use the playground. Then later we ask it to be moved to the modellab/ai/etc catalogue.

We need to define how/if we will use a shared file-system for Autosubmit experiments. For the demo we used S3, but that's not really an option for Autosubmit (we use NFS at ClimateDT and BSC). Maybe we could allocate a permanent volume to be bound to each Autosubmit container (with enough storage for the experiments? GB's, TB's?)

At the moment this is not doable. But that should be possible under the common modellab project. So all users under that project can target that volume there. We can test that after the modellab is created.

Q: are we going to give access to external users, to use this database (on the shared docker volume)?

...

LuiggiTenorioK commented 9 months ago

In GitLab by @kinow on Feb 29, 2024, 12:04

Action: we need to define who are the users of the model lab too. At the moment anyone can request an account to EDITO. The Catalogue Service view is available to unauthenticated users.

LuiggiTenorioK commented 4 months ago

changed due date to December 31, 2024

BSC-ES / autosubmit-api

API interactivity #55

Define the scope

Set some infrastructure cases

Define the action procedures