codalab / codabench

Codabench is a flexible, easy-to-use and reproducible benchmarking platform. Check our paper at Patterns Cell Press https://hubs.li/Q01fwRWB0
Apache License 2.0
65 stars 26 forks source link

Incorrect path only after uploading bundle #1414

Closed ruoxining closed 5 months ago

ruoxining commented 5 months ago

Problem description

I am the host of a competition. Before we published our competition, we ran the test locally and it ran well. However, after uploading it online, it never finds the correct path to the reference data, output, and user input folders, though it can run the scoring script. I wonder whether anyone could tell me what is the correct way to specify the input data path, given the folder structure below? Or could anyone provide a feasible bundle with a similar structure so that we can refer to it? Thanks in advance!

Setting and environment

Our codabench structure is as follows.

\ bundle.zip
| - generative_phase       // utils for one task, the generative task. All the subfolders are not zipped
|     | - reference_data
|          | - gold_gen.json  // gold label for generative task
| - multichoice_phase     // utils for another task, the multichoice task
|     | - reference_data 
|          | - gold_mc.json    // gold label for mutichoice task
| - scoring_program
|     | - metadata.yaml       // the yaml setting
|     | - score.py                 // the scoring program
| - competition.yaml

And our uploaded zip for testing is structured as follows.

\ test.zip
| - res_gen
|    | - res_gen.json
| - res_mc

A top .zip folder means that all the subfolders are selected and compressed, instead of a top folder of the subfolders is compressed.

Our scoring_program/metadata.yaml is as command: python3 $program/score.py $input $output. The relevant paths specified in our scoring program are as either ../path/to/data (e.g. generative_phase/reference_data/gold_gen.json) or 'path/to/data' (e.g.../generative_phase/reference_data/gold_gen.json ). However, neither works. Our scoring program keeps printing the warning that the input files are not found.

Screenshot 2024-04-17 at 00 11 55 (As shown in the screenshot, we output the args that the script receives)

Specific competition

Our competition is currently uploaded as NovelQA. You might upload any submission and see the specific warning in the terminal.

ihsaan-ullah commented 5 months ago

If you want to check a similar bundle, you can take a look here

ruoxining commented 5 months ago

Hi Ihsann, thanks for your answering.

ihsaan-ullah commented 5 months ago

Have you checked that the submission itself is zipped without its parent directory?

ruoxining commented 5 months ago

Yes

ihsaan-ullah commented 5 months ago

Can you check if your paths are set correctly like this https://github.com/ihsaan-ullah/create_a_codabench_challenge/blob/cfff1a0f6851ccfb9aebbf6a971919aa3e87c075/result_submission_bundle/scoring_program/score.py#L83

With res and ref

AND your yaml file has no mention of ingestion like here: https://github.com/ihsaan-ullah/create_a_codabench_challenge/blob/cfff1a0f6851ccfb9aebbf6a971919aa3e87c075/result_submission_bundle/competition.yaml#L47

ruoxining commented 5 months ago

Hi all! We fixed our competition through printing all the paths within the docker created, and find out that the uploaded files were actually not at the paths as in the original bundle. Previously, we mistakenly take the original paths as the correct ones.

For those upcoming hosts who meet the similar problem about paths, during debugging in the docker, you might also try to print all the paths under /app/ to find out the correct paths of your data.

Thank you all for your patience!