Ability to run 10,000 tasks

douban / pymesos

A pure python implementation of Mesos scheduler and executor

BSD 3-Clause "New" or "Revised" License

163 stars 88 forks source link

Hi,

When I use pymesos to run 10, 100, 1000 tasks at same time, it runs perfectly. However, for 10000 tasks at same time, some status of tasks are TASK_LOST.

I'm not sure the problem is pymesos or the setting I set.

Mesos Version: 1.9.0 Pymesos: git clone the latest (2020/6/9) Total CPU 412, MEM 5.2TB, Disk 983.9 For one task, it needs 0.01 cpu, 1M mem

For the task starts is TASK_LOST, The mesos master shows: Sending status update TASK_LOST for task task-xx of framework xxx 'Task launched with inva lid offers: Offer xxx is no longer valid'

I guess the cause is that two or above tasks use the same offer id. When one of these tasks finished, the offer will release, and the other task using same offer id cannot use this offer anymore.

douban / pymesos

Ability to run 10,000 tasks #128