Open davidskalinder opened 3 years ago
Hrmph, looks like there are a lot of reasons this could be happening. From what I can glean from the mod-wsgi
dev's interventions in many fora, the first step is to remember how to change Apache's logging level...
Well, apparently not only does the button not produce the file, it causes other page loads (such as the coding interface) to 500. So I guess I'll be debugging this on the backup VM for now huh.
Well, apparently not only does the button not produce the file, it causes other page loads (such as the coding interface) to 500. So I guess I'll be debugging this on the backup VM for now huh.
Incidentally, restarting Apache seems to clear this up
Well, apparently not only does the button not produce the file, it causes other page loads (such as the coding interface) to 500.
Same on backup VM btw. Interestingly, those page loads seem to get slower and slower as the export runs before finally they result in a 500 once the export has also gotten a 500.
Well, apparently not only does the button not produce the file, it causes other page loads (such as the coding interface) to 500. So I guess I'll be debugging this on the backup VM for now huh.
Incidentally, restarting Apache seems to clear this up
Huh, and they also seem to revive after a few moments, even without an Apache restart... (I'll still debug on the backup VM, mind you.)
Oh gosh don't try to export the data using that button on the web interface.... I mean you can if you really want to but that's why I wrote a cronjob to do it.
On Fri, May 14, 2021 at 12:28 PM davidskalinder @.***> wrote:
Well, apparently not only does the button not produce the file, it causes other page loads (such as the coding interface) to 500. So I guess I'll be debugging this on the backup VM for now huh.
Incidentally, restarting Apache seems to clear this up
Huh, and they also seem to revive after a few moments, even without an Apache restart... (I'll still debug on the backup VM, mind you.)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/davidskalinder/mpeds-coder/issues/122#issuecomment-841456565, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGC6DIIQZ5CJ6SEMKQRJ7TTNV2WRANCNFSM443HWQ5Q .
-- Alex Hanna, PhD alex-hanna.com Senior Research Scientist | Google Lecturer | UC Berkeley School of Information
Hmm, so here's what's in the log during an export, with LogLevel info
:
[Fri May 14 15:02:04.199584 2021] [wsgi:info] [pid 1328497] mod_wsgi (pid=1328497): Attach interpreter ''.
[Fri May 14 15:02:45.357219 2021] [wsgi:error] [pid 1328227] [client 144.92.191.197:59431] Truncated or oversized response headers received from daemon process 'mpeds': /var/www/dispatcher_[deployment].wsgi, referer: http://[domain]/[deployment]/admin
[Fri May 14 15:02:45.357216 2021] [wsgi:error] [pid 1328229] [client 144.92.191.197:59428] Truncated or oversized response headers received from daemon process 'mpeds': /var/www/dispatcher_campus_protest.wsgi, referer: http://[domain]/[deployment]/coderstats
[Fri May 14 15:02:46.121927 2021] [wsgi:info] [pid 1298573] mod_wsgi (pid=1328266): Process 'mpeds' has died, deregister and restart it.
[Fri May 14 15:02:46.123282 2021] [wsgi:info] [pid 1298573] mod_wsgi (pid=1328266): Process 'mpeds' terminated by signal 9
[Fri May 14 15:02:46.123625 2021] [wsgi:info] [pid 1298573] mod_wsgi (pid=1328266): Process 'mpeds' has been deregistered and will no longer be monitored.
[Fri May 14 15:02:46.147461 2021] [wsgi:info] [pid 1328502] mod_wsgi (pid=1328502): Starting process 'mpeds' with uid=33, gid=33 and threads=15.
[Fri May 14 15:02:46.283245 2021] [wsgi:info] [pid 1328502] mod_wsgi (pid=1328502): Initializing Python.
[Fri May 14 15:02:46.582634 2021] [wsgi:info] [pid 1328502] mod_wsgi (pid=1328502): Attach interpreter ''.
[Fri May 14 15:02:46.582754 2021] [wsgi:info] [pid 1328502] mod_wsgi (pid=1328502): Adding '[path/to/env]' to path.
So it looks like Python is actually getting killed and getting restarted, which accounts for the system-wide 500s...
Oh gosh don't try to export the data using that button on the web interface.... I mean you can if you really want to but that's why I wrote a cronjob to do it.
Wait, really? So the button has been doing this for a long time? It's worked for us until recently; maybe it's just a function of the file size?
I haven't used this button for the campus protest project for a very long time, just because each export is something like 50MBs now. Maybe your data ended up getting too large, yeah.
On Fri, May 14, 2021 at 1:12 PM davidskalinder @.***> wrote:
Oh gosh don't try to export the data using that button on the web interface.... I mean you can if you really want to but that's why I wrote a cronjob to do it.
Wait, really? So the button has been doing this for a long time? It's worked for us until recently; maybe it's just a function of the file size?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/davidskalinder/mpeds-coder/issues/122#issuecomment-841476969, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGC6DPNQSTEJXJI4XNPH7TTNV74LANCNFSM443HWQ5Q .
-- Alex Hanna, PhD alex-hanna.com Senior Research Scientist | Google Lecturer | UC Berkeley School of Information
Ah interesting, gotcha. You probably told me that before and I forgot, heh. Anyway good to know that this isn't the result of something I broke.
So if that's the case then it seems like this issue is probably a lower priority than I had thought: you've been running these exports from cron anyway, and I've been doing a similar export of the article-level annotation table straight from MySQL when needed (and so could do the same for the event-level table, which the button had provided).
So I think that means that:
If I'm right about item 2, then I think this issue is done, since probably nothing can be done about this problem under the current setup? (And making a new setup should get a separate issue.)
Yeah, I think that's right. Feel free to close out. If someone wants to run these from the UI completely, they can fork it and write that functionality.
Yeah, I think that's right. Feel free to close out. If someone wants to run these from the UI completely, they can fork it and write that functionality.
Yeah bring on the PRs, world! (Anybody? Anybody?)
Another thing that occurs to me is that perhaps the button should be removed from the UI altogether? Since after the DB reaches a certain size it is in effect a "crash website" button?
Another thing that occurs to me is that perhaps the button should be removed from the UI altogether? Since after the DB reaches a certain size it is in effect a "crash website" button?
Either way I'll open another issue for it and assess it there. Closing this one.
that'd be the smarter thing. Or at least tagged with a "disabled" modifier.
On Fri, May 14, 2021 at 1:32 PM davidskalinder @.***> wrote:
Yeah, I think that's right. Feel free to close out. If someone wants to run these from the UI completely, they can fork it and write that functionality.
Yeah bring on the PRs, world! (Anybody? Anybody?)
Another thing that occurs to me is that perhaps the button should be removed from the UI altogether? Since after the DB reaches a certain size it is in effect a "crash website" button?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/davidskalinder/mpeds-coder/issues/122#issuecomment-841485610, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGC6DLM5HQSNP24LNKZW2TTNWCEHANCNFSM443HWQ5Q .
-- Alex Hanna, PhD alex-hanna.com Senior Research Scientist | Google Lecturer | UC Berkeley School of Information
So at 55975cace4a I pasted nearly all of the code from the generate-coder-table.py
into the controller's generateCoderAudit()
, and ran it in @alexhanna's deployment on the backup VM. Something (the writes, I think) took a loong time and after almost exactly five minutes puts this in the Apache log:
Timeout when reading response headers from daemon process 'mpeds': /var/www/dispatcher_[deployment].wsgi, referer: http://[domain]/[deployment]/coderstats
After a further 9.5ish minutes, it dumps this:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 23-25: ordinal not in range(128)
Running python generate-coder-table.py
does exactly the same thing, without the timeout, in about the same amount of time, and produces an identical file in the exports
dir. (I didn't carefully merge this commit into an up-to-date version of what that deployment is running, so it's possible that that UnicodeEncodeError
is fixed elsewhere.)
Soo this makes me wonder whether in fact we should leave the button enabled and do this instead? Here are the things I can think of left unresolved by this:
All right, so, new plan for this: I'm going to refactor what's in generate-coder-table.py
and put it into a subfolder so it can be run by either the UI controller or a shell user.
Also, here's a strategy that I think might work for the timeout errors:
count_records()
as a separate functionwrite_chunk()
generateCoderAudit()
or a wrapper in the module call write_chunk()
until it's doneIt's not entirely clear to me how to correctly handle all those requests. But I'm hoping refactoring generate-coder-table.py
into distinct methods will help clarify things.
At adaa5862c10, set up a working test of the module with nothing in it. (Python's modules take some re-getting used to!)
NB no need for a class just yet I don't think: bare functions in the module seem fine until there's a need for things to have states?
Opened a new issue for the last few comments. The error that this issue is about will presumably get solved as part of that one, so I'll mark this as depending on that one, but I'll leave this one open in case it ends up requiring a different solution.
After correcting a permissions issue for the
exports
directory, I can now run the coder-table export from the UI for the dev deployment. However, when attempting to run it for either live deployment, I get a long pause and then a 500 error in the UI and this in the Apache log:I strongly suspect this is not a permissions issue but more likely something caused by a flask update or something? Lots of flask-related results come up from searching the error message; and presumably this is something to do with data size, either of the whole file or of characters, since the full DBs fail and the nearly-empty one works.
Luckily it seems that nobody's using this UI feature right now (since @alexhanna is running the same thing nightly from a script outside MAI without problems); but this button will be pretty essential once we're creating more data so this should be a fairly high-priority fix.