Md-Ashraful-Pramanik / MapCoder

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
MIT License
68 stars 15 forks source link

Not able to replicate results. #2

Closed harshraj172 closed 4 months ago

harshraj172 commented 4 months ago

Getting a Pass@1 score of 0 for GPT-4 model on MapCoder for CC dataset.

Md-Ashraful-Pramanik commented 4 months ago

Thank you for your query. Do you configure the ExecEval docker for evaluation?

On Sat, Jun 29, 2024, 7:47 PM Harsh Raj @.***> wrote:

Getting a @.*** score of 0 for GPT-4 model on MapCoder for CC dataset.

— Reply to this email directly, view it on GitHub https://github.com/Md-Ashraful-Pramanik/MapCoder/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANO3RTX2UVAHENWQLEVKQH3ZJ23I3AVCNFSM6AAAAABKDG6JFSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4DCNZVHAZDCNQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

harshraj172 commented 4 months ago

Yes, I did. I followed the steps as described here

harshraj172 commented 4 months ago

Attached image shows the logs from the terminal running exec-eval image

harshraj172 commented 4 months ago

Actually I am getting Pass@1 as 0 for all configurations

Md-Ashraful-Pramanik commented 4 months ago

The @.*** is 0 so I am suspecting that there is some misconfiguration. Can you give me one configuration? I mean dataset, language, and so on.

On Sat, Jun 29, 2024 at 7:57 PM Harsh Raj @.***> wrote:

Yes, I did. I followed the steps as described here https://github.com/ntunlp/ExecEval?tab=readme-ov-file#steps-assuming-dependencies-satisfied

— Reply to this email directly, view it on GitHub https://github.com/Md-Ashraful-Pramanik/MapCoder/issues/2#issuecomment-2198203305, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANO3RTUEV2O6QYHQ47UC6F3ZJ24OBAVCNFSM6AAAAABKDG6JFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGIYDGMZQGU . You are receiving this because you commented.Message ID: @.***>

harshraj172 commented 4 months ago

I am running the following script:

python src/main.py --dataset CC --strategy Direct 

the remaining arguments are default

harshraj172 commented 4 months ago

For reference the terminal output running the above command. image

Md-Ashraful-Pramanik commented 4 months ago

Can you see the results that are generated after the run? Or can you share with me that output file?

On Sat, Jun 29, 2024 at 8:12 PM Harsh Raj @.***> wrote:

For reference the terminal output running the above command. image.png (view on web) https://github.com/Md-Ashraful-Pramanik/MapCoder/assets/64323122/a1b874f9-2b23-4fe6-8572-ddf52653cd40

— Reply to this email directly, view it on GitHub https://github.com/Md-Ashraful-Pramanik/MapCoder/issues/2#issuecomment-2198208419, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANO3RTTIMLTJOKLQ5LVMDNTZJ26EDAVCNFSM6AAAAABKDG6JFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGIYDQNBRHE . You are receiving this because you commented.Message ID: @.***>

Md-Ashraful-Pramanik commented 4 months ago

Yes please.

On Sat, Jun 29, 2024, 8:28 PM Harsh Raj @.***> wrote:

Sure, I will email you because github doesnt support sharing .jsonl.

— Reply to this email directly, view it on GitHub https://github.com/Md-Ashraful-Pramanik/MapCoder/issues/2#issuecomment-2198213627, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANO3RTS6P4QK7MDXWIDRAYTZJ3ABJAVCNFSM6AAAAABKDG6JFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGIYTGNRSG4 . You are receiving this because you commented.Message ID: @.***>

harshraj172 commented 4 months ago

Here is the output file after running

python src/main.py --dataset CC --strategy Direct 
Md-Ashraful-Pramanik commented 4 months ago

I have evaluated these results on my machine. I found 11/165 correct. I think the problem is in your ExecEval Docker settings. Can you try to call their API using curl (taking 7th problem as it is correct) and see whether it is passed or not?

On Sat, Jun 29, 2024 at 8:32 PM Harsh Raj @.***> wrote:

Here https://drive.google.com/file/d/1q-eEAv1AViNp4QgjiM9KQbC5kYdZeix8/view?usp=sharing is the output file after running

python src/main.py --dataset CC --strategy Direct

— Reply to this email directly, view it on GitHub https://github.com/Md-Ashraful-Pramanik/MapCoder/issues/2#issuecomment-2198214803, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANO3RTUCPOG7FWC7F5XIEV3ZJ3AORAVCNFSM6AAAAABKDG6JFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGIYTIOBQGM . You are receiving this because you commented.Message ID: @.***>

harshraj172 commented 4 months ago

Oh, let me try that. But can it be because I m using SageMaker instance?

Md-Ashraful-Pramanik commented 4 months ago

I am experimenting from my machine. I mean docker and python both run on localhost. But as it is exposing API so i think SageMaker will not cause problems. Please check the Line 50 of src\evaluations\api_comm.py . If the API route and port is not correct then it will give an error.

On Sat, Jun 29, 2024 at 8:57 PM Harsh Raj @.***> wrote:

Oh, let me try that. But can it be because I m using SageMaker instance?

— Reply to this email directly, view it on GitHub https://github.com/Md-Ashraful-Pramanik/MapCoder/issues/2#issuecomment-2198221415, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANO3RTXBZA7QR7BXNFZOQBDZJ3DNXAVCNFSM6AAAAABKDG6JFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGIZDCNBRGU . You are receiving this because you commented.Message ID: @.***>

harshraj172 commented 4 months ago

Its working now. My Sagemaker instance memory was getting full. Sorry, I didnt see it. I am stupid :D. Thanks a lot for your help and time /\.

Md-Ashraful-Pramanik commented 4 months ago

No problem. Nice to meet you.

On Sat, Jun 29, 2024, 10:40 PM Harsh Raj @.***> wrote:

Its working now. My Sagemaker instance memory was getting full. Sorry, I didnt see it. I am stupid :D. Thanks a lot for your help and time /.

— Reply to this email directly, view it on GitHub https://github.com/Md-Ashraful-Pramanik/MapCoder/issues/2#issuecomment-2198256418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANO3RTVOX6PLGHNSNYOBQZLZJ3PRXAVCNFSM6AAAAABKDG6JFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGI2TMNBRHA . You are receiving this because you commented.Message ID: @.***>