azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
40 stars 10 forks source link

Big analysis imports are timing out, retrying a lot, and filling up the disk #841

Closed KlaasH closed 3 years ago

KlaasH commented 3 years ago

A few weeks ago and then again yesterday, a few imports failed, then a while later they all started failing. It turns out we have Django Q set to a 10-minute timeout, and that's not long enough for some imports, so they're getting killed before they finish. Also, when they get killed, they don't have a chance to clean up their working files, and the default number of retries for a failed task (which is what we're using) is 60. So once a task fails, it retries every 10 minutes, leaving another copy of its working files stranded in a /tmp/ directory, until the disk gets full and even small imports stop working.

So we need to raise the timeout and decrease the number of retries.

Questions to answer: