Closed andybak closed 6 years ago
Of course, having a proper queue system would probably be the best remedy. However, setting up a queue with workers is not always trivial and would probably be overkill in this situation. I've never used process pools myself, but it looks like an appropriate fix. What makes you think it will not solve the issue?
What makes you think it will not solve the issue?
Technically speaking it only defer the issue and increase capacity. But with some sensible warnings and errors handling ("Please disable linkcheck if you're going to update 152 million rows") that might be sufficient.
@fruitschen - would you like to take an initial look at this and see what you think?
yes, I'll look at it.
Added a tasks_queue and using a single thread to process it fixes our problem of "Exception Value: can't start new thread". https://github.com/DjangoAdminHackers/django-linkcheck/commit/0d398c6380cdf26c684e49a66a652420e6a70a7f
For extremely large update, we could set a limit for the size of tasks_queue and stops adding anything to it after a certain length.
It's already been reported that bulk actions (usually via management commands or in the shell) can cause problems as too many threads get spawned.
I've just hit a problem where this happens in a web form also and would like some feedback on the best solution.
Basically - I've got a client who has (ab)used a page template that allows an arbitrary number of inlines and added a lot more than I was expecting. Each inline has a wysiwyg field and therefore triggers linkcheck to do it's thing. I'm getting "Exception Value: can't start new thread"
Now - my first thought is to use something like multiprocessing.pool.ThreadPool to get a balance between execution time and number of threads - and to expose this as a setting.
It does feel somewhat like a band-aid. Spawning a thread for every single instance save() is going to end up causing problems at some point.
Really we should be setting a dirty flag or creating a task queue and working through all the updates in a background task.
@claudep and others - I'd appreciate your thoughts on whether it's worth doing this properly or whether the thread pool band-aid is good enough for now. (and if you've got any better ideas)