Ouranosinc / pavics-sdi

Power Analytics and Visualization for Climate Science - Spatial Data Infrastructure
https://pavics-sdi.readthedocs.io
7 stars 2 forks source link

Allow users to select the hardware a process will run on (4) #60

Closed huard closed 3 years ago

huard commented 6 years ago

Original: Create user interface allowing user to set-up preferences for where a process will be run (in-house or computecanada). Updated: Modify Birdy to support multiple servers and select the one with the fastest response time by default. Let user specify preference.

tomLandry commented 5 years ago

In the OGC testbeds, that would be the equivalent of Twitcher deploying to a specific "ADES" - Application Deployment and Execution Service. We have that at experimental level, so probably demonstrable to CANARIE. Operationalization could only be done in PAVICS-Hydro second phase though.

huard commented 5 years ago

Good enough.

tomLandry commented 5 years ago

@dbyrns Looking at the slim description and issue title here, do your feel the same? It's the same EMS - ADES concept. Of course, deployment to Calcul Canada would have to be done in the upcoming year, in the project's maintenance phase. As @lalondma told us, CC are a bit behind the curve anyway.

dbyrns commented 5 years ago

Well an EMS selecting an ADES to move the processing where the data is located is not really the same as allowing a user to choose where he want to execute a process (based on speed/cost ratio I guess) and on different type of architecture i.e we cannot presume that CC and PAVICS will ever have the same interface (as for ADESes).

dbyrns commented 5 years ago

Indeed the concept is the same, but I highly doubt that the implementation will help us here.

huard commented 5 years ago

The general idea here is to give users some choices. I think an easy way to do this is to simply add CRIM instances of Raven to the list of servers federated through Magpie. Now in itself that is not so useful unless the user has some feedback on the load of each server and the proximity of the data.

What I'd like to be able to offer to users is a flat list of "abstract processes" available somewhere in the federation. When the user selects the process and fills in the inputs, a default server is picked based on server load and proximity to data. The interface would allow the user to explicitly override that default choice. I could imagine a Launch button with an optional drop-down menu offering "Launch on A / Launch on B" with icons showing the server load and estimated data transfer time.

dbyrns commented 5 years ago

That was my understanding and it's completely unrelated to the OGC work. Here are the challenges I foresee :

  1. How the aggregator can determine that 2 providers do the same job (currently we cannot register 2 providers having the same name in magpie), maybe by using the GetCapabilities Title?
  2. Choosing the right server is not trivial... load, proximity, etc. these are not available right away and are not part of any standard.
  3. And then we have to rewrite the UI to allow this type of interaction
huard commented 5 years ago
  1. I'd check (server.identifier, process.identifier). We'd find different base URLs, server versions and process versions marching the same ids.
  2. Understood. We could use the number of parallel_processes running, which is stored by PyWPS in a database as a proxy for load, and a simple ping speed times file size as the proxy for download time. Poor choices I'm sure, but sufficient for a proof of concept.
  3. Indeed. If we lack the time in year 1, I'd be happy with a shell-based solution.
dbyrns commented 5 years ago
  1. serveur.identifier is not a WPS thing? But yes the collection of process ids can be considered as a unique identifier, just not very practical.
  2. running processes are meaningless unless you know the hardware available threads, but interesting metrics. Just how to get them? Customizing PyWPS with special route to obtain a "good choice" rating? And handle cases where that route doesn't exist...
  3. Does birdy offer the aggregation of multiple servers or is it a single server wrapper? Could be a good candidate to package 1. and sort the better server using 2.
huard commented 5 years ago
  1. Sorry, its ServiceIdentification/Title or ProviderName in the getCapabilities XML.
  2. What if we time the describeProcess requests and use that to rank servers ? Birdy could launch that in the background every 2-3 minutes and adjust the ranking.
  3. Not yet, it is for the moment a single server wrapper, but adding support for multiple servers would be relatively easy (about a day of work).
dbyrns commented 5 years ago

We are working on a new bird based on the EMS (Execution Management Service) developed in OGC's Testbed14. The main features was a REST interface, dynamic docker application deployment, CWL workflows, monitoring and results retrieval. As an extension of the testbed we are now trying to encapsulate existing WPS 1.0 (and ESGF CWT API) provider processes as EMS applications. This will allow us to offer them as processes in a single list (processes of the EMS) [Point 1]. Birdy could be updated to support also the new REST interface following the next gen OGC WPS-T 2.0 standard [Point 3]. And PAVICS will use it as soon as we can work on it [Also point 3]. As a nice to have, the EMS could implement a logic to determine the best suited process provider when running a job [Point 2].

So as you can see we now have a funded project to make that happens!

tomLandry commented 5 years ago

This issue is currently assigned to @lalondma because he has been working on launching Docker containers on ComputeCanada. If we need to take a step back in Architecture before going forward, that's fine but we need to show something late April, early May.

dbyrns commented 5 years ago

The discussion has indeed run over this issue topic. Offering Calcul Canada as typical WPS is still the main issue. The EMS will help the client to choose transparently between the Cloud version of PAVICS and CC only if CC can be exposed in the same manner.