globus-gladier / gladier

An SDK for rapidly developing Globus Flows while leveraging Globus Compute
Apache License 2.0
9 stars 3 forks source link

Track user specific endpoints/paths through "deployments" #145

Open NickolausDS opened 3 years ago

NickolausDS commented 3 years ago

Currently, all input is specified through run_flow, including funcx endpoints, globus endpoints, paths, and even overridden funcx ids. This makes clients a bit inflexible when maintained by multiple users, since each endpoint/path/id is owned by a single user and not shared. If two users want to run the same flow, it means throwing out all the previous users variables and replacing them with the current users deployment variables.

Supporting custom deployments would fix this, and allow user customizations for a given set of runs.

class NickDeployment:
    checks = ['globus_endpoints', 'funcx_endpoints', 'funcx_functions',
              'funcx_containers', 'flow_deployment']

    globus_endpoints = {
        'source_globus_ep': 'e55b4eab-6d04-11e5-ba46-22000b92c6ec',
        'compute_globus_ep': '08925f04-569f-11e7-bef8-22000b9a448b',
    }

    funcx_endpoints = {
        'funcx_endpoint_non_compute': '553e7b64-0480-473c-beef-be762ba979a9',
        'funcx_endpoint_compute': '2272d362-c13b-46c6-aa2d-bfb22255f1ba',
    }

    funcx_containers = {
        'Corr': {
            'location': '/foo/bar/baz',
            'container_type': 'singularity',
        }
    }

    flow_input = {
        'input': {
            'processing_dir': '/foo/bar/baz'
        }
    }

my_flow = MyGladierClient()
# Deployments would be instantiated objects passed to `run_flow`. 
my_flow.run_flow(deployment=NickDeployment())

When the deployment is passed to run_flow(), it is equivalent to passing everything explicitly as flow_input. In addition to specifying input, Deployments can enforce various levels of checks to ensure the environment is setup correctly. These can range from the current built-in checks that the flow and funcx-functions are up-to-date, but would also be capable of checking the status of funcx-endpoints, globus-endpoints, and potentially custom user-defined checks.

Deployments would be optional, with a default deployment which would result in the same behavior as now.

Note: #12, #59, #76,

NickolausDS commented 3 years ago

I'm not sure where all the gotchas are with this system. Comments or perspectives are very welcome!

ravescovi commented 3 years ago

This will be very interesting to have things like portal.conf, laptop_raf.conf, polaris.conf, etc

NickolausDS commented 3 years ago

Yeah, each of these could go into a separate module, and even be referencable by dotted string if we want. Something like:

MyGladierClient(GladierBaseClient):
    deployment = 'mypackage.deployments.PortalDeployment'
NickolausDS commented 3 years ago

Hmm, another potential issue when this is used by portals. Eventually, I think we'll want to have users start their own flows from the portal using custom globus endpoints, funcx endpoints, etc. In that scenario, deployments can't be hard-coded ahead of time within a repo. It would be good to support instantiating deployments with custom values, like this:

portal_deployment = PortalDeployment(
    globus_endpoints={
        'source_globus_ep': 'e55b4eab-6d04-11e5-ba46-22000b92c6ec',
        'compute_globus_ep': '08925f04-569f-11e7-bef8-22000b9a448b',
    },
    funcx_endpoints={
        'funcx_endpoint_non_compute': '553e7b64-0480-473c-beef-be762ba979a9',
        'funcx_endpoint_compute': '2272d362-c13b-46c6-aa2d-bfb22255f1ba',
    }
)
ravescovi commented 2 years ago

@NickolausDS Is this a closed issue? I can open to add those capabilities to the client-template and documents?