codalab / codabench

Codabench is a flexible, easy-to-use and reproducible benchmarking platform. Check our paper at Patterns Cell Press https://hubs.li/Q01fwRWB0
Apache License 2.0
60 stars 26 forks source link

Getting issue in evaluating submission in worker #1423

Closed surupendu closed 1 month ago

surupendu commented 2 months ago

Hi,

I am setting up the competition Machine Translation for Indian Languages @https://www.codabench.org/competitions/2889/

Since the default queue has a 600 second time limit, I switched to custom queue with remote worker. The competition is result submission only.

So as per my understanding an ingestion program is not required. However, the submission is not getting processed and throwing error given below: image

Please can you help with the issue as I am not able to solve it using the documentation

ihsaan-ullah commented 2 months ago

Can you please attach your yaml file here, the part where you define your task

surupendu commented 2 months ago

I used competition form instead of the competition bundle for creating the competition. So I don't have yaml file for this.

I mentioned the scoring program and reference data here: image

The .yaml file of the scoring program has this command image

ihsaan-ullah commented 2 months ago

Can you change your command to

command: python3 metrics.py

assuming metrics.py is your scoring program

AND

zip your scoring without the parent directory (just zip the files together)

surupendu commented 2 months ago

It is now not able to find the program image

surupendu commented 2 months ago

Hi,

On further investigation of logs it seems that the codabench server interacts with the remote worker machine and tries to run the scoring code locally.

image

And may be due to security restrictions in the network the code does not get copied to the worker. Is my understanding correct?

ihsaan-ullah commented 2 months ago

I am not sure what is wrong there but you can check this bundle which is results only competition and you can modify it to meet your needs and then submit a submission to see what happens

Didayolo commented 2 months ago

Hi @surupendu

The ingestion program not found is a warning, not an error message. It is normal to receive it in case of result submission competitions.

surupendu commented 2 months ago

Hi @Didayolo

Earlier due to the issue with the zipping of the scoring program at folder level, the evaluation was not working as the submission was getting timed out.

As I could not access the logs in the default queue, I switched to using remote worker queue where in the logs showed that it was searching for the ingestion program. Turns out I needed to zip the scoring program at a file level instead of folder level (based on solution provided by @ihsaan-ullah) However now the remote worker started giving the issue that "metrics.py" not found in directory.

I switched to using the default queue and my submissions started working correctly. The submissions are now using the default queue.

My only concern now is that if I switch to using a remote worker out of need for heavy computation then I will face the issue of scoring program not found, the error that I was getting earlier.

Didayolo commented 2 months ago

@surupendu @ihsaan-ullah

It is absolutely not normal, and quite concerning, that the submissions works on the default queue but not on a custom queue.

This could indicate that recent changes made to the compute worker's code (#1408) are breaking something.

Indeed, the workers of the default queue are not up-to-date, while @surupendu is using the latest version of the code when creating the workers.

ihsaan-ullah commented 1 month ago

I have recently setup new workers on Google Cloud. I haven't any problem like this

Didayolo commented 1 month ago

Indeed. @surupendu Are you still experiencing problems?

surupendu commented 1 month ago

@Didayolo and @ihsaan-ullah thank you for the support. I am now running the competition on default queue. If I face any issue in custom queue I will share the logs with you. We can close the issue for now.