AISG-Technology-Team / AISG-Online-Safety-Challenge-Submission-Guide

Submission Guide + Discussion Board for AI Singapore Online Safety Prize Challenge
https://ospc.aisingapore.org
14 stars 5 forks source link

Detective Orange - Weibin - Submission failed #50

Open ACEIceC opened 8 months ago

ACEIceC commented 8 months ago

Hi, I tried to upload my submission three times but all of them are failed. I'm confused about the reason. My previous submissions are successful. And all these submissions are uploaded on the same device in my lab, the network should be stable. Does anyone have the same situation as me? Thanks for your help.

Zhuifeng414 commented 8 months ago

The same happened to me. It's annoying. The time spent on upload submission was even longer than the time for coding and always failed.

ACEIceC commented 8 months ago

Yes, and I can't figure out the reason and find a way to solve it. I still have inconsistent GPU memory consumption online and offline. Can't find reason for these...


发件人: Tesla @.> 发送时间: 2024年3月20日 12:48 收件人: AISG-Technology-Team/AISG-Online-Safety-Challenge-Submission-Guide @.> 抄送: Weibin Cai @.>; Author @.> 主题: Re: [AISG-Technology-Team/AISG-Online-Safety-Challenge-Submission-Guide] Detective Orange - Weibin - Submission failed (Issue #50)

The same happened to me. It's annoying. The time spent on upload submission was even longer than the time for coding and always failed.

― Reply to this email directly, view it on GitHubhttps://github.com/AISG-Technology-Team/AISG-Online-Safety-Challenge-Submission-Guide/issues/50#issuecomment-2010038918, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BF4APQ2KGWW5CJ5PZF62COTYZG4WZAVCNFSM6AAAAABE7WBM5KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJQGAZTQOJRHA. You are receiving this because you authored the thread.Message ID: @.***>

shshen-closer commented 8 months ago

I failed five times

huyenbui117 commented 7 months ago

I failed 4 times

fzjcdt commented 7 months ago

I've also encountered a similar issue where my submissions failed twice, despite previous successful attempts. In my case, the submission failure message appeared immediately after clicking the submit button. Could there be a system issue at play here?

petergro-hub commented 7 months ago

The frequent failures add a luck-based component I suppose. To be really sure I usually don't touch anything on my computer after clicking upload, although that also doesn't help sometimes. (Btw If you open the submission page while it's still uploading in another tab, it will show as failed, even though it's not, so don't panic, just let it finish uploading) If you're unsure the model can run in the environment, you can also rent cheap v100s for 1$ per day online if you know where to look. Personally, I just hope I can upload my best models before the competition ends.

aliencaocao commented 7 months ago

By the way, from our benchmarking, their v100 is only running at 70% of its intended performance (FLOPS). That explains my team's submissions timing out forever and runs half the speed as on our local v100 pcie 16gb We spent more time debugging runtime issues than training models

petergro-hub commented 7 months ago

True, I mean I do believe adjusting the solution to their specs is the main part of the challenge almost (which they also state in the description). Adjusting solutions is almost 90% of my time efforts. But that makes it all the more frustrating if you have a solution that could be a winning solution with some tinkering, but can't even evaluate it because the submission fails and now have to wait a week for a resubmission that's not guaranteed to work. But in either case, I suppose we're all in the same boat with this.

aliencaocao commented 7 months ago

adjusting the solution to their specs

Not if their spec is wrong to begin with. I can give a H100 and underclock it at 30% of its clock. Is that still a fair game for all? Just feel really unprofessional of them to be hosting this in hope to advance SG's AI industry

Oh and let's not mention how V100 is outdated and no one uses it -> very little optimizations work on them.