Open davidskalinder opened 4 years ago
I don't think this is very likely to be an issue in Access -- short text defaults to no more than 255 chars (so I'm not using it much) and long text can go up to something like 1 gigabyte.
I don't think this is very likely to be an issue in Access -- short text defaults to no more than 255 chars (so I'm not using it much) and long text can go up to something like 1 gigabyte.
Gotcha, thanks. I think there might be a medium length these days? But it sounds like it shouldn't cause any trouble no matter how it shakes out.
I haven't committed the change upstream yet. But here are the SQL ALTER TABLE
commands.
ALTER TABLE coder_event_creator MODIFY COLUMN text TEXT;
ALTER TABLE coder_event_creator MODIFY COLUMN value TEXT;
I should check the data we already have to see if any of the fields have been maxed out yet.
Lucky us!: none of them have been yet (though one entry is at like 1500). So, I'm just going to run these ALTER TABLE
s and then merge in the model changes once they're committed upstream.
Ah and I just realized I should do this for our article-level coding table too: new ALTER TABLE
s and a pair of edits to models.py
.
So @alexhanna, I guess that should end up as a new PR from us that will depend on umm https://github.com/MPEDS/mpeds-coder/pull/29?
Whoops, also, @alexhanna: do we want to lose the NOT NULL
and DEFAULT NULL
constraints on those columns?
Here's the column specs from the old SHOW CREATE TABLE
:
`value` varchar(2000) NOT NULL,
`text` varchar(2000) DEFAULT NULL,
... And from the new one after running them thar ALTER TABLE
s:
`value` text,
`text` text,
I don't know how much value those constraints were adding, but given that I don't know, my inclination is to make the minimal possible change... so it seems safer to leave them in?
For my own reference, I've only run the two ALTER TABLE
s on the event-level table of our dev deployment so far...
Another note for myself: it doesn't look like there's any chance of data loss when changing the columns to make them more permissive, but juuuust in case, I probably oughta run a set of exports immediately prior to running the ALTER TABLE
s on the production DB...
Ohhhh yeah. That's a good point.
So it should be
ALTER TABLE coder_event_creator MODIFY value TEXT NOT NULL
ALTER TABLE coder_event_creator MODIFY text TEXT DEFAULT NULL
Also for some reason coder_id is nullable in my coder_event_creator
? That seems like an issue.
Ohhhh yeah. That's a good point.
So it should be
ALTER TABLE coder_event_creator MODIFY value TEXT NOT NULL
ALTER TABLE coder_event_creator MODIFY text TEXT DEFAULT NULL
Sounds good. Do you want to change this in the models spec, or shall I do it and PR it?
Also for some reason coder_id is nullable in my
coder_event_creator
? That seems like an issue.
Could be? Though the application presumably doesn't try to put any nulls in there, so I doubt it'll come up... But yes, safer to have the constraint of course. I haven't done a careful check for this kind of thing elsewhere in the DB, so for all I know there are other places where the column specs could be tightened, but it seems like that's another (and lower-priority) issue than fixing the value
and text
field in a way that doesn't introduce any new problems?
If you want to do it and PR it I'd be much obliged.
Yeah I doubt it'll come up but it should be changed in the model. Lower priority for sure.
If you want to do it and PR it I'd be much obliged.
Yeah I doubt it'll come up but it should be changed in the model. Lower priority for sure.
Gotcha, sounds good on both points.
So turns out that TEXT
columns are default null by, er, default: https://stackoverflow.com/questions/5242990/default-value-for-a-text-column
So these should be sufficient:
ALTER TABLE coder_event_creator MODIFY value TEXT NOT NULL
ALTER TABLE coder_event_creator MODIFY text TEXT
So, something else strange is going on with this. I've run two statements above for our development deployment, and now inserting text works fine for up to 5788 characters, then fails.
The way it fails is to leave the field unchanged and to show a flash error on the UI. It looks like this is slightly different to the way a field with a 2000-character limit fails, which is to throw a flash error and to change the field to an empty string.
Further research is required...
Hrm. That's weird. It should be 21,844 characters according to this.
And actually the tests I've been doing have been going into the value
field, which isn't even unicode, so we should be getting the full 65,535.
My suspicion is that this isn't the DB but like http or something super fun like that. I'll see what I can find.
I keep forgetting that Firefox has a debugger in it. So here's the response from the XHR:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>414 Request-URI Too Long</title>
</head><body>
<h1>Request-URI Too Long</h1>
<p>The requested URL's length exceeds the capacity
limit for this server.<br />
</p>
<hr>
<address>Apache/2.4.18 (Ubuntu) Server at redacted.manually.by.ds Port 80</address>
</body></html>
I am not sure how proud I should be of my hunch...
Anyway, this makes a lot of sense since this is a GET
request. It seems like the obviously sensible thing to do would be to change it to a POST
, but I don't know enough about the intertubes to be able to think of whether this might screw other things up. @alexhanna, what does your spidey sense tell you?
It does seem to me like this is worth us fixing, btw: 5788 is sort of a lot of characters (like 1200 words I guess?), but the URL limit is a byte limit, so presumably the utf-8'd text fields will max out at something like a third of that...
I don't think changing from GET to POST will result in any errors. And it's a pretty easy fix -- you just change the "type" in the AJAX call, e.g. here
Yep, it did look pretty easy. And I guess POST
is more proper anyway (when we're, like, posting things).
So yeah I'll change the type to POST
for text fields and the text capture. There are a bunch of other GET
s scattered across the application for which I assume POST
s would also be more appropriate, but I guess I'll stick to the principle of minimal change and only change these two for now?
Ah, okay. I did not know POST was actually more proper here. Could you file an issue for us to make more systemic changes and make GET
and POST
match the principles in the HTML spec? Low priority, of course, because it's not like it's breaking too many things right now.
Hmm:
HTTP/1.1 405 METHOD NOT ALLOWED
Date: Mon, 29 Jun 2020 20:40:57 GMT
Server: Apache/2.4.18 (Ubuntu)
Allow: HEAD, OPTIONS, GET
Set-Cookie: session=[blah blah blah]; HttpOnly; Path=/
Content-Length: 178
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
Soo I guess that Allow
field doesn't like the POST
request? I assume that's some kind of Apache config thing, so I'll look around starting with that stuff...
Okay, this looks like it is getting pretty deep into Apache-land here. This makes me nervous about touching it, since:
Here are some links with advice in the original Greek:
https://httpstatuses.com/405 https://www.ionos.com/digitalguide/hosting/technical-matters/error-405-method-not-allowed-explanation-and-solutions/ https://stackoverflow.com/questions/45598901/post-method-returning-405-method-not-allowed-on-apache-production-server-symfo https://serverfault.com/questions/438183/how-to-enable-all-http-methods-in-an-apache-http-server
All this makes me think I should wait for @alexhanna's opinion before going much further on this. Sooo yeah I'ma do that.
@alexhanna FTW (in conversation): pass a methods=['GET', 'POST']
argument to the @app.route()
decorator (like the login decorator has always had). So, no need to fiddle around with the Apache settings, praise FSM.
Mind you, then some other thing goes wrong, which I'll need to debug in the Old Familiar Way.
Status: after allowing the POST, am getting a 500 error and for the life of me I can't get any information about it out of the controller function to anywhere I can find it. Raising exceptions in the controller function, asserting 1==0
, doing app.logger.info('blah blah blah')
-- none of it puts anything in the apache log, or into anything in the html response (unless it's something that crashes the whole application on any page load, like a syntax error -- that goes into the apache log).
That logging page suggests that by default logged info is sent to stderr, which I assumed was the apache log, but darned if I can find it. If this goes on much longer I'll probably have to open a new issue and branch for a better-thought-through logging setup.
Also, I could try testing logging in the data wrangling script, where I have more control over the environment...
See good ol' #29 for debugging/logging issues.
I can work around this for now by, for my sins, wrapping the entire function in a try block and returning the exception in the AJAX response, then looking at the XHR debug for the text.
Here's the outcome of this one:
result: (_mysql_exceptions.OperationalError) (1048, "Column 'article_id' cannot be null") [SQL: u'INSERT INTO coder_event_creator (article_id, event_id, variable, value, text, coder_id, timestamp) VALUES (%s, %s, %s, %s, %s, %s, %s)'] [parameters: (None, None, None, None, None, 2L, datetime.datetime(2020, 7, 8, 14, 9, 56, 4321))]
... which is all because I forgot to change all the request.args.get()
s to request.data.get()
as we figured out on the call last week.
change all the
request.args.get()
s torequest.data.get()
{"result":"'str' object has no attribute 'get'"}
Okay, so not request.data.get()
exactly... but that's the problem anyway.
Okay, 72f13cb71d0 fixes that problem. And I can now confirm that the description field collects 65,535 characters on the nose before failing.
I think technically we should still put something in the system to prevent or at least warn users who exceed that limit that it won't go into the database... But it's pretty unlikely that entries will ever get that long, so that's a pretty low priority. I'll open a new issue for it.
Still need to fix all this in the text select fields, and then do all the DB migration stuff lol.
Still need to fix all this in the text select fields, and then do all the DB migration stuff lol.
Correction: the migration should be all done, but I still need to test it by implementing it in testing, then zorching and rebuilding the dev DB and checking that the table structure looks the same.
But first need to change the GET
s to POST
s for the text select.
And after all that, need to do everything again for article-level stuff in a separate branch.
But first need to change the GETs to POSTs for the text select.
651b368.
Now something's going wrong with radios, presumably because they also use add_code
. Not sure why exactly, but they should be changed to POST
s anyway, so I'll fix it that way.
Okay, I'm starting to lose track of which GET
s and POST
s go where... so given that they should all be POST
s, I'm going to create a new issue and branch for this and change them all now...
Okay, I'm starting to lose track of which
GET
s andPOST
s go where... so given that they should all bePOST
s, I'm going to create a new issue and branch for this and change them all now...
Whoops, that would be the already-created #101. So I'm going to actually see if I can resolve that now before moving ahead with this one.
Whoops, that would be the already-created #101. So I'm going to actually see if I can resolve that now before moving ahead with this one.
And voila, #101 is done modulo user testing, a mere two months later. Along the way:
But first need to change the GETs to POSTs for the text select.
Now something's going wrong with radios, presumably because they also use
add_code
. Not sure why exactly, but they should be changed toPOST
s anyway, so I'll fix it that way.
The reason: https://github.com/davidskalinder/mpeds-coder/issues/101#issuecomment-685882800
So, this will hopefully bring an end to the GET
/POST
saga. That brings us back to:
Still need to fix all this in the text select fields, and then do all the DB migration stuff lol.
Correction: the migration should be all done, but I still need to test it by implementing it in testing, then zorching and rebuilding the dev DB and checking that the table structure looks the same.
It looks like these are the table changes that need to be checked post-rebuild:
ALTER TABLE coder_event_creator MODIFY value TEXT NOT NULL
ALTER TABLE coder_event_creator MODIFY text TEXT
But first need to change the
GET
s toPOST
s for the text select.
Done! But I think they should stay in separate branches since they're separate from the DB changes. And arrrgh some commits with this are already in the (pushed) event_columns_to_text
branch. Not sure how best to handle this.
And after all that, need to do everything again for article-level stuff in a separate branch.
I think this is all done. "Everything" should mean the DB changes (done at 834b5185d) and the http changes (done at 4a117bd1634).
So what remains is to manage the branches for PRs (if necessary) and to check the DB rebuild.
But first need to change the
GET
s toPOST
s for the text select.Done! But I think they should stay in separate branches since they're separate from the DB changes. And arrrgh some commits with this are already in the (pushed)
event_columns_to_text
branch. Not sure how best to handle this.
So @alexhanna, maybe this is a question for you: if I PR my branches event_columns_to_text
, change_gets_to_posts
, do we care if there's a few commits in event_columns_to_text
that overlap with stuff in change_gets_to_posts
(and which should, strictly speaking, only be in change_gets_to_posts
)?
If we don't care (since we'll both use both branches anyway) then I'll just go ahead and PR them once I test that the DB rebuilds correctly. If we do want to keep them separate then I'll have to slightly elevate my git-fu in order to elegantly clean up the branches before PRing them...
Confirm that after rebuilding DB from d028592dfc (and, almost certainly, from 636e3b0957, which is the same but without the GET
/POST
changes), SHOW CREATE TABLE coder_event_creator\G
yields exactly the same output (except for the AUTO_INCREMENT
value) as before the rebuild but after running the ALTER TABLES
.
These are the statements to jury-migrate the article-level table:
ALTER TABLE coder_article_annotation MODIFY value TEXT NOT NULL
ALTER TABLE coder_article_annotation MODIFY text TEXT
And I have confirmed that after building the DB from scratch at 834b5185d1, SHOW CREATE TABLE coder_article_annotation\G
outputs the exact same thing as before the rebuild but after the ALTER TABLE
s.
I can't remember whether I've tested a zillion-character input in an article-level field, so I'll do that next.
I can't remember whether I've tested a zillion-character input in an article-level field, so I'll do that next.
Confirmed with the expected number of chars before the GET
fixes and then the full 65K-whatever after them.
I think this whole world-encompassing issue is now done, modulo lots of non-specific user testing (probably to be handled by #107) and the PRs. So I'll leave it open but move it to a column where I'll pay less attention to it.
Oh yeah, and merging. Merged into testing
like so:
article_columns_to_text
: 2bc9b51b3987
event_columns_to_text
: be2b3d832f983fd9
change_article_gets_to_posts
: 7e71728574250
change_gets_to_posts
: 793bfdb24ce6
No apparent problems with this after a week of testing with training articles. Looks like the manual DB changes still need to be run in the production DB. I should probably sit down with a mug of hot chocolate and read the whole thread before deploying just in case I'm forgetting something.
All right, looks like the ALTER TABLE
s that need to be run in production are the ones in https://github.com/davidskalinder/mpeds-coder/issues/96#issuecomment-686123166 and https://github.com/davidskalinder/mpeds-coder/issues/96#issuecomment-686164109.
Other than that I think the only thing remaining for this issue is the not inconsiderable task of sorting out the PRs (see especially https://github.com/davidskalinder/mpeds-coder/issues/96#issuecomment-686145422 for more on this).
Ran ALTER TABLE
s in production.
@alexhanna's project was getting data loss since the 2000-character length constraints are in the MySQL table but aren't handled in the UI, so anything a coder wrote or selected that was longer than 2000 characters just got silently sawed off. We'll almost certainly have the same problem, though I should check the data we already have to see if any of the fields have been maxed out yet.
@alexhanna, it looks like the changes to
models.py
aren't in the upstream master yet? Also, do you have theALTER TABLE
statements that you ran so that we can run the same ones? (I didn't see an issue for this in the upstream repo, but feel free to point me there if I missed it.)@johnklemke, I think this might affect you as well, since once we fix it, the max characters on any value or text fields will change from 2000 to infinity... Let me know if you need more info.