elixir-cloud-aai / TEStribute

Task distribution logic for use in proTES repo
Apache License 2.0
6 stars 3 forks source link

TEStribute

License Python_versions Build_status Website GitHub_tag PyPI_release Coverage

Task distribution for GA4GH TES instances.

Synopsis

Proof of concept implementation of a task distribution logic for a federated network of GA4GH Task Execution Service (TES) instances.

TEStribute_working

Usage

You can use TEStribute in three ways:

curl -X POST SERVICE_URI -H "Content-Type: application/json" -d PAYLOAD
testribute [-h] --tes-uri URI --cpu-cores INT --ram-gb FLOAT --disk-gb FLOAT
           --execution-time-sec INT [--jwt TOKEN] [--object-id ID] [--drs-uri
           URI] [-m MODE] [-v]
from TEStribute import rank_services

rank_services(...)

Implementation details

Given a set of available GA4GH Task Execution Service (TES) instances, a task's compute resource requirements, the Data Repository Service (DRS) object identifiers of all task inputs (if any), and a list of DRS instances where these objects might be obtained from, TEStribute returns a list of combinations of TES instances and input object locations, rank-ordered according to either increasing estimated total costs, increasing estimated total processing times, or a weighting factor that balances both of these properties.

The application currently relies on [modifications] to the TES specifications and assumes that DRS object identifiers are globally unique (i.e., a given identifier will point to the same exact file on any DRS instance), which is not warranted by current DRS specs. More detailed information on these requirements is available at mock-TES and mock-DRS, mockup services which implement these modifications/assumptions. The corresponding clients TES-cli and DRS-cli are used within TEStribute to interact with these services.

Installation

Deploying the API service

Ensure you have the following software installed:

Clone repository and start Docker service

git clone https://github.com/elixir-europe/TEStribute.git app
cd app
docker-compose up --build --detach

You can explore the HTTP API via the Swagger UI:

firefox http://localhost:7979/ui/

CLI usage & import

Ensure you have the following software installed:

Install package and testribute console script:

pip install TEStribute

Extended usage

Options

The following properties/options are available when running TEStribute, regardless of whether the software is run as an HTTP API service, as a console script or directly from within your Python code. The CLI option is indicated in parentheses in those cases where it differs from API / import usage:

For more details, including typing information, explore the API definition, which also forms the basis for validating CLI arguments and the inputs to the rank_services() function.

Example calls

The following are equivalent calls for either of the TEStribute entry points defined above. Note that the provided TES and DRS URIs point to test instances of the services which may or may not be up and running at any given time. Therefore, the success of the calls cannot be guaranteed.

API service call payload (JSON)

{
  "object_ids": [
    "a001",
    "a002"
  ],
  "drs_uris": [
    "http://131.152.229.71/ga4gh/drs/v1/",
    "http://193.166.24.114/ga4gh/drs/v1/"
  ],
  "mode": 0.5,
  "resource_requirements": {
    "cpu_cores": 1,
    "disk_gb": 1,
    "execution_time_sec": 1800,
    "ram_gb": 1
  },
  "tes_uris": [
    "http://131.152.229.70/ga4gh/tes/v1/",
    "http://193.166.24.111/ga4gh/tes/v1/"
  ]
}

Console script call

testribute \
  --tes-uri="http://131.152.229.70/ga4gh/tes/v1/" \
  --tes-uri="http://193.166.24.111/ga4gh/tes/v1/" \
  --cpu-cores=1 \
  --ram-gb=1 \
  --disk-gb=1 \
  --execution-time-sec=1800 \
  --object-id="a001" \
  --object-id="a002" \
  --drs-id="http://131.152.229.71/ga4gh/drs/v1/" \
  --drs_id="http://193.166.24.114/ga4gh/drs/v1/" \
  --mode=0.5

Function call

from TEStribute import rank_services

rank_services(
    object_ids=[
        "a001",
        "a002"
    ],
    resource_requirements={
        "cpu_cores": 1,
        "ram_gb": 1,
        "disk_gb": 1,
        "execution_time_sec": 1800
    },
    tes_uris=[
        "http://131.152.229.70/ga4gh/tes/v1/",
        "http://193.166.24.111/ga4gh/tes/v1/"
    ],
    drs_uris=[
        "http://131.152.229.71/ga4gh/drs/v1/",
        "http://193.166.24.114/ga4gh/drs/v1/"
    ],
    mode=0.5,
    jwt=None
)

Return types

Success

Upon success, the API service returns a JSON object such as this:

{
  "service_combinations": [
    {
      "access_uris": {
        "a001": "ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/dna//Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz",
        "a002": "ftp://ftp.ensembl.org/pub/release-81/bed/ensembl-compara/11_teleost_fish.gerp_constrained_elements/gerp_constrained_elements.tetraodon_nigroviridis.bed.gz",
        "tes_uri": "http://193.166.24.111/ga4gh/tes/v1/"
      },
      "cost_estimate": {
        "amount": 294727.1443451331,
        "currency": "EUR"
      },
      "rank": 1,
      "time_estimate": 2514
    },
    {
      "access_uris": {
        "a001": "ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/dna//Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz",
        "a002": "ftp://ftp.ensembl.org/pub/release-81/bed/ensembl-compara/11_teleost_fish.gerp_constrained_elements/gerp_constrained_elements.tetraodon_nigroviridis.bed.gz",
        "tes_uri": "http://131.152.229.70/ga4gh/tes/v1/"
      },
      "cost_estimate": {
        "amount": 294697.1938522269,
        "currency": "EUR"
      },
      "rank": 2,
      "time_estimate": 3298
    }
  ],
  "warnings": []
}

You can check out the Response model in the API definition for more details. For the other entry points, the general response upon success is the same, but provided in different ways. When calling rank_services() directly from within Python code, the response is an instance of Python class Response, which is based on the corresponding model in the API definition and defined in module TEStribute.models.response. It can be converted to dictionary form with:

response = rank_service(...)
response.to_dict()

It can be further converted to JSON with:

import json

json.dumps(response.to_dict())

When using the testribute console script, the JSONified response is printed to STDOUT.

Failure

In case of failure, the API service returns a JSON object of the following form:

{
  "code": 400,
  "errors": [
    {
      "reason": "werkzeug.exceptions.BadRequest",
      "message": [
        "Services cannot be ranked. None of the specified TES instances provided any task info."
      ]
    }
  ],
  "message": "The request caused an error."
}

When using the console script testribute, an error will lead to the script exiting with a non-zero return code. In addition, warnings and errors are written to the log which is streamed to STDERR, e.g.:

[WARNING] TES unavailable: the provided URI 'http://i.do.not.exist/' could not be resolved.
[ERROR] ResourceUnavailableError: Services cannot be ranked. None of the specified TES instances provided any task info.

When calling rank_services() directly from within Python code, traceback information for any error is provided, too. For example:

[WARNING] TES unavailable: the provided URI 'http://i.do.not.exist/' could not be resolved.
Traceback (most recent call last):
  File "<stdin>", line 21, in <module>
  File "/home/uniqueg/Dropbox/repos/TEStribute/TEStribute/__init__.py", line 129, in rank_services
    target_currency=models.Currency[config["target_currency"]],
  File "/home/uniqueg/Dropbox/repos/TEStribute/TEStribute/models/response.py", line 55, in __init__
    timeout=self.timeout,
  File "/home/uniqueg/Dropbox/repos/TEStribute/TEStribute/utils/service_calls.py", line 311, in fetch_tes_task_info
    "Services cannot be ranked. None of the specified TES instances " \
TEStribute.errors.ResourceUnavailableError: Services cannot be ranked. None of the specified TES instances provided any task info.

Configuration

It is possible to configure some settings of the app, e.g., how JWTs are parsed, processed and forwarded or in which prices costs are reported, by modifying the the config file before starting the service / running TEStribute.

Testing

Unit and integration tests can be run with the following command:

pytest

Note that test coverage is currently sparse and tests are unstable.

Contributing

This project is a community effort and lives off your contributions, be it in the form of bug reports, feature requests, discussions, or fixes and other code changes. Please read the contributing guidelines if you want to contribute. And please mind the code of conduct for all interactions with the community.

Versioning

Development of the app is currently still in alpha stage, and current versioning is for internal use only. In the future, we are aiming to adopt [semantic versioning] that is synchronized to the versioning of mock-TES, TES-cli, mock-DRS, and DRS-cli in order to ensure that these apps will be compatible as long as both their major and minor versions match.

License

This project is covered by the Apache License 2.0 also available shipped with this repository.

Contact

Please contact the project leader for inquiries, proposals, questions etc. that are not covered by the Contributing section.

Acknowledgments

The project is a collaborative effort under the umbrella of the [ELIXIR Cloud and AAI] group. It was started during the 2019 Google Summer of Code as part of the Global Alliance for Genomics and Health organization.

logo banner