automan-lang / AutoMan

Human-Computation Runtime
https://docs.automanlang.org
GNU General Public License v2.0
48 stars 12 forks source link

Scaling issue with MTurk ("System Unavailable") #30

Open dbarowy opened 8 years ago

dbarowy commented 8 years ago

This issue occurs when you call a Question function a large number of times (e.g., mapping a function across a large list) which results in an MTurk HIT with many assignments. I initially encountered this problem in the lead-up to the OOPSLA deadline, and I introduced an exponential backoff mechanism in the belief that the problem had something to do with the frequency of calls to MTurk. Unfortunately, this did not help.

Later debugging revealed that the problem occurred within the MTurk SDK itself. I suspect that the issue is that large HITs have many assignments; assignments are thus "paged"; and there is no rate-limiting in place to prevent the SDK from hammering MTurk when there are a large number of pages. I am going to track down this issue and address it for real now.

dbarowy commented 8 years ago

Here's a screenshot of one of the early profiler runs where I discovered that the problem was happening inside the SDK. A subsequent change (1b9a023045ee5aecf3e133abaf90f1e149264f7c) in AutoMan instead tells the SDK to generate a ServiceException instead retrying itself, which allowed me to implement an exponential backoff mechanism. As mentioned earlier, this fix did not actually address the issue.

image