Project Title: Implement robust evaluation pipeline in EvalAI


Currently, the submission worker that evaluates the challenge requires manual scaling. Moreover, logging & metrics-monitoring isn’t available to the challenge hosts for the submission worker in real-time. Also, an often requested feature by the challenge organizers has been the ability to test their competition package (evaluation scripts, etc) locally before uploading it to EvalAI. This capability will also reduce assistance required by the platform maintainers. The goal of this project is to write a robust test suite for submission worker, port it to AWS Fargate to setup auto-scaling and logging. The tasks will also include giving control to challenge hosts over the submission worker from the UI in terms of starting, stopping and restarting it.


Extended Goals:

Mentor: Ram Ramrakhya @Ram81 , Rishabh Jain @RishabhJain2018 , Deshraj @deshraj

Skills: Python, Django, Django Rest Framework, AWS, Docker

Skill Level: Hard

Get started: Try to fix some issues in EvalAI (note that there are some issues labeled with GSOC-2019)


a) Docker b) AWS-Fargate

Important Links:

RishabhJain2018 commented 5 years ago

Can you help me how to get started?

@navneel99 Please start by setting up EvalAI on your local machine and then start solving good-first-issue or GSOC-2019 issues.

To get familiar with the requirements, can we go ahead and make PRs relevant to this?

@KhalidRmb Yes.

KhalidRmb commented 5 years ago

Hi! Had some queries, and it would be very helpful if the mentors can help me navigate them.

@RishabhJain2018 @deshraj @Ram81

KhalidRmb commented 5 years ago

Regarding shifting to Fargate:

KhalidRmb commented 5 years ago

Regarding the task Provide naming for worker containers for different challenges, there already is a mechanism for that in the file here:

Could you please provide some clarity regarding the task?

KhalidRmb commented 5 years ago

@RishabhJain2018 @deshraj @Ram81 Could you please take a look at these, and the doubts I've asked on Gitter? The proposal deadline is very near. Thanks!

RishabhJain2018 commented 5 years ago

Hi @KhalidRmb,

I gather it is necessary for docker-based challenges where the host wants to test the submission against diverse environments with different requirements in each worker container.

What if a challenge host wants to parallelize the submission processing for the non-docker based challenges?

Coming to non docker-based challenges, the main concern to scale the workers is speed and submission bottlenecks? Because the worker evaluates the submissions sequentially, running workers in parallel (with the same configurations) are faster from the host's perspective. Is that correct or did I miss something?

I didn't get what you meant by speed here but the idea is to parallelize the submission processing near the challenge end date so that more people can submit to the challenge.

Is only the submission worker to be shifted or the Django container along with it?

For now, we're focussing on the worker container only.

If scaling the worker alone, we could define a new task definition only for the worker and use scale through boto3. Could you please help clarify the situation here?

I'd like to see the complete approach in proposal. Also, I've answered your query.

Regarding the task Provide naming for worker containers for different challenges, there already is a mechanism for that in the file here:

Yes, it is already there. But docker doesn't allow running two containers with the same name on a single machine, so a fix regarding it will be needed in the deliverable.

KhalidRmb commented 5 years ago

I didn't get what you meant by speed here but the idea is to parallelize the submission processing near the challenge end date so that more people can submit to the challenge.

This is what I meant itself. Thanks.

