Submission Types / System thoughts

Concept

When trying to find a solution for dynamically scaling compute to run hundreds of submissions, we think about using workflow engines such as nextflow, cromwell, WES implementations (cwl-wes), TES implementations (funnel), and many more. Due to the use of workflows, we can think that this service would contain an API endpoint that transforms a submission + queue bundle to workflow + workflow inputs / configuration.

These are the possible types of submissions:

If submission is a flat file (except a workflow), a workflow must be specified by admin
If submission is a docker image, workflow templates and inputs must be specified by admin
If a submission is a workflow, workflow inputs must be specified by admin
If a submission is an API service, the API service must be started first prior to running a workflow + its inputs specified by the admin.

We will leave out API submissions for now, as they don't fall in our original paradigm.

Docker submission

Let's dive deeper into a docker repository as a submission. The most basic form a submission would be a json object:

{
    "docker_image": "myuser/myimage@sha25...."
}

{
    "docker_image": "myuser/myimage",
    "docker_digest": "sha25..."
}

If we were to use CWL, this would mean templates would have to be specified to replace the dockerHint. The queue would be configured with an admin specified workflow bundle and workflow inputs would have to be specified.

workflow.zip

workflow.cwl.mustache
run_docker.cwl.mustache
validate.cwl
score.cwl

workflow_inputs.json

{
"data": {
    "class": "Directory",
    "location": "/dir/to/data"
},
"output_filename": "prediction.csv"
}

The transformation process would be:

Obtain submission object docker_repo = (docker_image + docker_digest)
Replace docker hint with docker_repo from step one (create run_docker.cwl from run_docker.cwl.mustache
Make sure to use full run_docker.cwl path in workflow.cwl.mustache (Or... just make sure run_docker.cwl is in the same directory... (might not need workflow.cwl.mustache)
Get workflow inputs
Run workflow cwltool workflow.cwl workflow_inputs.json

@thomasyu888 @jaeddy Here is an initial attempt to enumerate different submission types:

Docker image

docker registry (required)
docker image (required)
tag
digest

I think is the easiest way to submit a tool for benchmarking is as part of the CI workflow. From this POV, the workflow would build and push the image to a docker registry, then submit a JSON object to the submission API. The digest of the image can be obtained by the workflow using:

jobs:
  docker:
    runs-on: ubuntu-latest
    steps:
      -
        name: Set up QEMU
        uses: docker/setup-qemu-action@v1
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
      -
        name: Login to DockerHub
        uses: docker/login-action@v1 
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      -
        name: Build and push
        id: docker_build
        uses: docker/build-push-action@v2
        with:
          push: true
          tags: user/app:latest
      -
        name: Image digest
        run: echo ${{ steps.docker_build.outputs.digest }}

Image tags are more user friendly than digest, especially when displayed in a leaderboard. The issue is that tags can move, i.e. the digest they refer to may be rewritten. Let's assume that we request the user/workflow to submit a tag. When the submission API receives the submission object with the tag, it could complete the submission object with the digest. We could also ask the user to provide the digest, though there is low risk that the tag would be rewritten between the time the tag has been created and the image has been submitted. Asking for both the tag and the digest from the user make the submission more complex, especially if a submission is made manually using a CLI. Anyway, I think that it's important for the submission object to keep track of the tag, to use as a human-friendly reference, and the digest. Now if we want to rely primarily on tags, we should make sure tag submitted for evaluation are never rewritten. First, we could refuse a submission object that include a tag that has been previously submitted. At the leaderboard level, we could have a "smart" leaderboard that check for each submission that the digest behind a tag on file is still the same as when the submission has been made. This check could slightly slow down the loading of the leaderboard table as one check per submission displayed would need to be done. If the record do not match, a notification could be displayed that one can no longer rely on the tag. The issue is that if someone writes down the tag of the image and use it in production, the tag may be rewritten and the production setup would change without being notified...

As a good practice, it may be important to actually not use (human-friendly) tags but instead use the 12-digest of the docker images, e.g. 991a88305980. Unfortunately, it seems that we can't pull an image with the 12-digest only...

The good practice should then be that in production, the full digest should be used. This means that the leaderboard should enable someone to copy paste the full digest to the clipboard. Now in this context the human-friendly tag may still have it's place for a submitter to track easily the performance of the different version of his/her submissions (e.g. private dashboard). Yet we will educate the user and remember them that the full digest must be used in production.

Moreover, we could request that the image submitted must include a set of required opencontainers LABELs. When receiving the submission, we could complete the submission object by querying these labels. That way the submitter does not have to specify these information twice (as LABELs and as properties of the submission object), thus preventing the risk of discrepancies. This is if we value having all images annotated with this standard LABELs. If we don't care about this, then it's easier for us to request the submitter to pass all the information as part of the submission object.

Finally, the submission API should check upon receiving the submission that "we" have access to the image. This would enable us to promptly reply to the user with a specific HTTP error code and message to inform him/her that the image has not been shared with the benchmarking platform (public or shared with our service user). Now is there a case where we would not want the submission API to get access to the image and instead only selected "data hosting site" could pull the image? I.e. is there a situation where the submitter may not want the platform to have access to the image but only the "data hosting sites" that are not managed by us? For example for an internal competition setup by a company, maybe they don't want the benchmarking platform to get access to the models submitted by their employee, and only their "data hosting site" (also controlled by the company) should be able to pull the image. I think that this is use case that we should consider. An alternative to this approach is for a company to deploy the full benchmarking platform within their secure network, with an option to later export the results of the challenge to the central benchmarking platform to make the data publicly available (after they have published and patented the results :))

@jaeddy I thought you may be interested in early discussions on the design of the submission API. If you prefer, we could instead tag you later in the design process.

GitHub release/commit

TBA

File

TBA

Notebook (Jupyter and R)

TBA

@tschaffter . Ill try to touch on the numerous points you made:

Docker tag vs Docker digest I don't think that we should have to keep track of the tags, because it would be confusing. As you stated above, lets say we displayed this in the leaderboard: organziation/mytool:mytag. Someone could pull down that image + tag at a later date and it could be completely different from what the submission was when the participant submitted it. That being said +1 to the shortening of the digest on the leaderboard. We can also provide nice tools that allow for people to specify tags at the time of submission but the tool will automatically pull the digest out. (Currently the synapse submit cli does this, you don't need to specify the whole digest to submit) Example:
Docker LABELS I'm indifferent about the labels here. It could most certainly be a "clever" thing we do to fill out the submission object for people if they already have all the information as part of the Docker label. The important question to ask ourselves is if there are differences in the what should be part of the Docker LABEL vs Submission object.
Docker Permissions There is a lot we can say about this. We want to offer participants the ability to submit private repositories to a challenge. Due to this, we would either have to have our own challenge docker registry or utilize a docker registry that allows for private repositories (Synapse, Quay.io, Dockerhub - only 1 free private repo allowed per user account...).
- DockerHub or Quay.io: Participants would have to share their repositories with a.) a service account per challenge, b.) one service account to rule them all. c.) public
- own challenge docker registry: As part of the submission process, the docker image would be built and submitted (this is evalAI's paradigm) into our own registry. We won't need to deal with checking if we have access to a docker image, because we have access to all their submissions on our platform.
I don't think there is a situation where the submitter may not want the platform to have access to the image but only the "data hosting sites". After-all, a submission to the platform means they are "submitting it to us". That being said, if as part of the submission there was a field that explicitly stated which service account it was shared with, then we could achieve this use-case.

@tschaffter I think its great that we are coming up with new submission types, but I think it might be easier if we fine tuned the "File" and "Docker Submission" and then designed others

Notebooks

As for the Notebook submissions, this would most certainly be interesting. It could be interesting to offer a compute environment for people that has the "synthetic data" for a challenge. Participants can interact with the code on a real time basis with the data and once they are happy with their model, they can "submit" the notebook. This submission will then run on any internal data. That being said, the general basis of a notebook submissions is simply a File.

Github Commits

I wrote up a github submission proposal awhile ago.

File

Bundle of files: .zip
Workflow files .cwl
general files .csv, .txt, .tsv.....

I don't think that we should have to keep track of the tags, because it would be confusing.

I think that displaying tags in the submission dashboard of a submitter is important. Most educated developer will not rewrite semver tags. We can also do some effort to educate developer regarding this point. Also, we would only accept semver tag (regex) so that we would not access a latest, nightly, edge tags, for example. If we educate well the small fraction of developers who still consider rewriting semver tags (bad practice!), we can assume that tag will be useful for a quick review of the leaderboard tables and submission dashboards. This does not change the fact that we will also provide the digest and that tools must be deployed using the digest in production environment. So if you agree, I would capture the tag as part of the submission object.

About Docker LABELs

One important point is that validating the Docker LABELs of an image requires that the submission API has access to the Docker image. As I mentioned, there may be use cases where we don't want the submission API itself/benchmarking platform to have access to the image. Not having to access the Docker images would also simplify greatly the submission API. So I think that we should do LABELs validation as part of the submission API. This task could be delegate to an orchestrator/controller down the road.

The important question to ask ourselves is if there are differences in the what should be part of the Docker LABEL vs Submission object.

Here are the standard labels that we use for all the NLP Sandbox Docker images.

We can come back to this question once we have identified all the information that we need to capture as part of the submission, then decide where this information should live.

We want to offer participants the ability to submit private repositories to a challenge.

Yes

About Docker registries

These are great comments. Does EvalAI has its own Docker registry?

One paradigm I want us to further explore is the concept of "what is benchmarked is what is deployed in production environment". This paradigm guided the designed of the NLP Sandbox. This means that where the image leave should be suitable for both the purpose of benchmarking and deployment to production environment as we probably don't want the same image to live in two different places. Initially we could focus on the largest existing registry, DockerHub, which supports public and private (limited to 1 with free account) images. I think that most public tool would like to be hosted on DockerHub.

Alternatively we could provide our own Docker registry as you suggest. I'm not sure how much work this would involve. There are two approaches:

Platform-level registry: we provide a central registry where images are pushed. The challenge platform would need to pay for storing images. This also means that the banchmarking platform is responsible for the security of the images stored.
Challenge-level registry: a registry to store the images submitted to a challenge. This approach could give more control to the challenge organizers. They would also pay for storing the images. However this would increase the complexity of the infrastructure for the organizers and I'm not sure the benefits out-weight the drawback. I think that we can discard this approach.

For the initial implementation, I would suggest to rely on DockerHub, the most popular Docker registry. A paid account is only $5/month for individual and unlimited private repositories. If we request Docker images to be pushed to DockerHub.

Another question is whether we (the benchmarking platform) or the challenge organizers want to cache the submitted images, or live image access to the discretion of the submitters. If the images should be cached, who should have access to them? Everyone? Only the challenge organizers? At least during a challenge, we can expect that the image should not be made immediately public.

I think that we can start with relying on DockerHub where the submitter fully control the visibility of the image, and can decide to make it public at the end of the challenge, for example. Later down the road to reduce the risk of having models disappearing, we could add our own registry and enable the caching of the image to this registry. Once again, this would come at the cost of us having to pay for storing the images (we can possibly ask a contribution to the challenge organizers) but also being responsible for the security of the data.

If we agree that images must be stored on DockerHub at least at first, there are two ways private images could be shared with the benchmarking infrastructure:

The challenge platform or the challenge organizers provide a DockerHub service account. The submitter must give this account read access to the private image.
The submitter submit credentials that can be used to access the image. Since we would be responsible for handling these credentials, I don't really like this solution. This solution is also more complex because we work with N different credentials where N is the number of participants version 1 credentials for the first solution. This also enable us to make the documentation clearer.

I recommend option 1) for the above reasons. Therefore, the submission object does not need to contain properties related to account credentials.

TODO

[ ] Review EvalAI submission schema for Docker image (@thomasyu888 )

@tschaffter

Also, we would only accept semver tag

I think we can definitely collect the tag, it's relatively simple to just add it to the DockerSubmission schema. That being said, we still cannot be 100% certain that someone doesn't accidentally write over a tag. It is extremely easy to do so using Docker even if you are using CI/CD. Our examples currently have something like:

docker build -t repo:v1 ....
docker push repo:v1

This isn't stopping anyone from rebuilding that exact tag again. Regardless of how much education and tools we provide participants, I still think it's a bit risky. I can also imagine github workflows and templates that we provide people to help the, but at the end of the day, we cannot be 100% certain that tags will remain exactly the same. We can however be 100% certain that the digest stays exactly the same.

Does EvalAI has its own Docker registry?

Yes EvalAI has its own Docker registry and this deals with the issue of cacheing of submissions that you brought up. It sort of works similarly to Synapse, but when you submit the docker repo, you are automatically giving permission for the admin to pull that specific digest. I think they also have the ability for participants to specify whether or not they want their docker image to be publicly available (which is impossible for Synapse since you can't pull a Synapse Docker repo unless you're logged into Synapse).

Platform-level registry vs DockerHub

I think we should definitely add support for DockerHub, but ultimately this adds complexity to the "cacheing" of submissions. I think upon submission, we should re-tag the docker submission and push it into our own registry. Agreed with the service account approach (each challenge could have a service account to increase security)

If the images should be cached, who should have access to them? Everyone? Only the challenge organizers? At least during a challenge, we can expect that the image should not be made immediately public.

I think the images should definitely be cached, and we should actually be evaluating the submission through these cached submissions and not the actual submission itself (If using DockerHub), because someone can delete their DockerHub repository. During a challenge, the image should probably be private and only accessible by challenge organizers.

Review EvalAI submission schema for Docker image

I searched for a bit of time to find the submission schema on eval AI, but I couldn't find the API. All I know is for docker based challenges, evalAI provides a cli that allows for people to push and submit their submission into the evalAI docker registry

Sage-Bionetworks / submission-schemas