codalab / codalab-worksheets

A collaborative platform for reproducible research (web interface and CLI).
Other
155 stars 84 forks source link

Many subsequent runs with 1GB disk writes causes problems #1914

Open teetone opened 4 years ago

teetone commented 4 years ago

Starting up 1000 bundles with 1GB disk write each causes workers to offline or bundles to be stuck in a Running state (can't kill the bundle either). A few of the bundles actually complete. We need to further investigate this issue.

To reproduce this issue use stress_test.py to create about 1000 bundles with large disk writes (run python stress_test.py --help for more information on how to run the script).

percyliang commented 4 years ago

What's the smallest load that causes problems?

On Mon, Jan 20, 2020 at 11:55 AM Tony Lee notifications@github.com wrote:

Starting up 1000 bundles with 1GB disk write each causes workers to offline or bundles to be stuck in a Running state (can't kill the bundle either). A few bundles actually complete. We need to further investigate this issue.

To reproduce this issue use stress_test.py to create about 1000 bundles with large disk writes (run python stress_test.py --help for more information on how to run the script).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/codalab/codalab-worksheets/issues/1914?email_source=notifications&email_token=AADKJOE4UPVCN2ZU77QSQVDQ6X6S5A5CNFSM4KJJFKM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHOHCYA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADKJOAENES54ZA7LRURRL3Q6X6S5ANCNFSM4KJJFKMQ .

teetone commented 4 years ago

@percyliang I will try out different sizes and update this issue.