davidskalinder / mpeds-coder

MPEDS Annotation Interface
MIT License
0 stars 0 forks source link

Workaround and data collection for multiple-text-select deletion #92

Closed davidskalinder closed 4 years ago

davidskalinder commented 4 years ago

Done in 6098fb7; merged into testing branch and live in the testing deployment. If testers think this is okay, needs merge to master and production deployment as well as a PR upstream.

davidskalinder commented 4 years ago

So, @olderwoman and @matthewsmorgan:

There are several options for how I think we could fix #89 in the long term, which I'll outline in that thread. But for the short term I think the best thing to do is to force the workaround that we've asked the coders to use (i.e., to not delete multiply-entered selected text), in case they forget or didn't get the message. (Pushing this workaround upstream will also give @alexhanna a quick-and-easy way to mitigate the problem.)

So I've added functionality for the server to refuse requests to delete multiple text-select records at once, and to raise an error message in the UI that says "Error deleting item. Leave duplicates in so we can collect data on this bug." That should stabilize the situation until we get a real fix in place, and ensure we get the data to measure how bad the problem was.

I think it does need a test from some people who are used to using the UI though -- would you mind taking a look at the testing deployment and see how it goes when you add text-selects multiple times and then delete them? Remember, if you want to see the real state of what's in the database, you can refresh the page (changing tabs in the coder panel won't do the trick).

Let me know if you need any more info about how to test. I think it'd be good to get this workaround live (and sent upstream) ASAP, so if you think the functionality is good enough, let me know and I'll move it into production.

davidskalinder commented 4 years ago

@olderwoman / @matthewsmorgan, just wanted to send you a quick reminder about this one since despite the useful and interesting discussion over at issue #89, this one is actually the higher priority since it'll hopefully stop the bleeding entirely (even beyond the considerable slowing that will likely result from our email to the coders).

However I don't think I should make the change live until you two make sure it's okay, so please let me know if you need any more info in order to test it out.

olderwoman commented 4 years ago

Sorry I failed to spot that you wanted quick turn-around on error testing. You can't do it? I'll log into the testing deployment now and give it 10 minutes.

olderwoman commented 4 years ago

One test in testing version. 1#. I marked the same text twice. Then deleted one of them, it asked me was I sure, I said yes, then it refreshed the page with both in place. I did not see the error message until after I hit “save” and returned to the article-level page,, not the event-level page. So the error message appears in the wrong place and I can imagine a coder getting very frustrated about why the deletion didn’t work and continuing to try it, since the relevant error message does not appear at that point, instead you get the “are you sure you want to delete this?” message.

It seems like this behavior would have been detected if there had been any test at all of the user experience? The “error deleting item” message would need to appear in place of the “are you sure you want to delete?” message.

olderwoman commented 4 years ago

More info, coming back into the event page after a save, attempt to delete the duplicate simply fails, but there is no error message. Clicking does nothing. Trying again: OK I did screen shots this time, I now see that the error message does appear when I delete, but it is nowhere near where I was working when I did the deletion. See screen shots.

davidskalinder commented 4 years ago

Thanks @olderwoman.

So the error message appears in the wrong place and I can imagine a coder getting very frustrated about why the deletion didn’t work and continuing to try it

The “error deleting item” message would need to appear in place of the “are you sure you want to delete?” message.

I now see that the error message does appear when I delete, but it is nowhere near where I was working when I did the deletion.

Yes, I knew about that actually, but I wasn't sure how annoying it would be to someone who wasn't already expecting it (which as usual is why we need user testing!). It sounds like the answer is "pretty darn annoying", at least when you're looking for what happens when the deletion is made.

The problem is that the standard way that MAI shows error messages is at the top of the article text, so showing it elsewhere will require building a new way for the JavaScript code to handle this particular error.

Ideally, a message like the current one (which soon disappears on its own) would appear near the control that triggered the error, but this could be rather difficult to implement. A whole-screen alert might be a little easier, but at the cost of requiring yet another click from the user to dismiss the alert.

Either of these could get rather tricky though, so I guess it's worth me asking how annoying we think this would be if the user wasn't specifically looking to test the behavior when the deletion was made? Perhaps they'd get a little frustrated the first time, find the error message, and then every time after that remember "oh yeah, they've disabled deleting duplicates"?

davidskalinder commented 4 years ago

coming back into the event page after a save, attempt to delete the duplicate simply fails, but there is no error message. Clicking does nothing.

Hmm, so I think this happened to me also, but I couldn't reproduce it so I hoped it might have just been due to a funny state I somehow created during development. But maybe not.

By "clicking does nothing", it sounds like you mean that you didn't even get the "are you sure you want to delete this" confirmation? That happened to me for one entry, and I saw that somehow it didn't get the JS event listener attached to it, but like I say I couldn't reproduce the behavior and so couldn't debug it further. I'll have another try; meanwhile, if you can figure out how to make this behavior happen reliably, please let me know.

olderwoman commented 4 years ago

Would it be possible to edit the “are you sure” message before deletion to say (Note: due to bug, duplicates will not be deleted) ?

davidskalinder commented 4 years ago

The “error deleting item” message would need to appear in place of the “are you sure you want to delete?” message.

Incidentally, to clarify, this won't work because when the confirmation request happens the UI doesn't yet know that the item is a duplicate. We'd have to confirm the deletion request, check the DB to see that it's a duplicate, and then issue another error message after that.

davidskalinder commented 4 years ago

Would it be possible to edit the “are you sure” message before deletion to say (Note: due to bug, duplicates will not be deleted) ?

Hmm, yeah, that's a bit of a kludge, but not too bad for a temporary fix. And it would be easy to implement.

I'm convinced. I'll put that in and let you know when it's in the testing deployment.

davidskalinder commented 4 years ago

Uh so apparently I can't delete any text-select items, even the non-duplicates. I haven't changed anything, so I guess this was happening (in the test version) yesterday and neither of us tested that functionality?

Anyway, if this happens to you, I'm on the case already. If you try deleting something and it works, let me know, cuz that's unexpected at this point...

davidskalinder commented 4 years ago

I guess this was happening (in the test version) yesterday

Yep, I see what I did wrong. Stupid of me not to check regular deletion when I tested. But anyway, easy to fix now that I've caught it.

olderwoman commented 4 years ago

I can test deleting a non-duplicate, but I can’t see the underlying database.

davidskalinder commented 4 years ago

I can test deleting a non-duplicate, but I can’t see the underlying database.

Yeah, don't worry about it, I see what's going wrong (a typo in an if condition)

olderwoman commented 4 years ago

Ok, I just tried deleting a non-duplicate; didn’t work and the error message at the top changed.

davidskalinder commented 4 years ago

So here's something I just thought of: is there a chance we don't want to do this at all?

The case I just realized is problematic is if a coder adds something twice and then realizes that both entries were wrong -- for example, she added an actor's name twice to the bystander field. If she had only added it once, she'd just remove it from the bystander field and add it to the actor field; but if she added it twice, the new functionality would mean she couldn't remove it at all.

I think this is a showstopper for this issue? (The cure is worse than the disease!) It makes me nervous having nothing but coders' eternal vigilance to work around the original bug, but I guess it's better than not giving them a way to remove bad data?

olderwoman commented 4 years ago

Yes, I do think being able to delete bad data is helpful, although pretty easy to fix in pass 2 as you just read irrelevant info in a field and ignore it. The big thing is to tell coders to try to get it in the right field as well, if you start barring deletion.

davidskalinder commented 4 years ago

Yes, I do think being able to delete bad data is helpful, although pretty easy to fix in pass 2 as you just read irrelevant info in a field and ignore it.

Well, my worry would be that it might not be clear without reading a lot of the article? I suppose also a big clue would be that it's a duplicate -- we could view all the duplicate entries as suspicious and needing attention.

So I guess the question is how much extra work would it create in pass 2 to treat all duplicate text-select entries as though they might have been clickos? If the answer is "not much extra work", then it sounds like it is worthwhile to prevent deletion of duplicates after all (until we have some data on how frequently it happens and can put in the solution to #89 of course)?

davidskalinder commented 4 years ago

Okay, commit 55172cf should fix the unable-to-delete-anything error and b85ce8ec adds @olderwoman's note to the confirm box.

This should all now be live at the testing deployment, so @olderwoman / @matthewsmorgan, it'd be good if you can take another look at it and make sure there's nothing else incredibly annoying about it. :)

If it's all working, then we just need to decide whether the extra pass 2 work of treating all duplicate entries as suspicious is small enough to make this change worthwhile for the next couple weeks at least (and I'll mention @johnklemke here since I realize this is getting into pass 2 territory)...

olderwoman commented 4 years ago

I’ll check it now and report back

olderwoman commented 4 years ago

Ok tested, behavior as expected: warning in the duplicate button, pink warning at the top when I tried to delete a duplicate. When I tried to re-delete the same duplicate, it just didn’t do anything. However, when I tried to delete the original entry that was duplicated, I get the are you sure with bug warning but ALSO the option to “prevent this page from creating additional dialogs” which I’m not sure we want? Then when I try to delete the original item again, I get the expected are you sure with bug warning and when I say OK instead of going back to the page I get the second message: delete warning + bug warning + option to prevent additional dialogs. Other things seem to delete ok, and the warning goes away after a different deletion.

davidskalinder commented 4 years ago

Ok tested, behavior as expected: warning in the duplicate button, pink warning at the top when I tried to delete a duplicate.

Sounds good.

However, when I tried to delete the original entry that was duplicated, I get the are you sure with bug warning but ALSO the option to “prevent this page from creating additional dialogs” which I’m not sure we want?

Yeah, that seems to always happen when a page creates more than one dialog. I suspect this is something the browser puts in as a security measure to prevent zillions of dialogs, so I don't think there's anything we can do about it (other than to avoid using dialogs)...

Given that that's the case, it sounds like the behavior when you delete this first entry is identical to the behavior when you delete the second entry, which I think is what we expect?

Then when I try to delete the original item again, I get the expected are you sure with bug warning and when I say OK instead of going back to the page I get the second message: delete warning + bug warning + option to prevent additional dialogs.

Hang on, so you see two confirmation dialogs in a row, with no actions in between?

olderwoman commented 4 years ago

Then when I try to delete the original item again, I get the expected are you sure with bug warning and when I say OK instead of going back to the page I get the second message: delete warning + bug warning + option to prevent additional dialogs.

Hang on, so you see two confirmation dialogs in a row, with no actions in between?

Yes, the second dialog box opened after I clicked ok on the first dialog box

davidskalinder commented 4 years ago

Yes, the second dialog box opened after I clicked ok on the first dialog box

Hmm, that is strange, I can't get anything like that on my end. Are you around for a Skype call so you can show me on your screen?

olderwoman commented 4 years ago

I guess so. I’ll turn on the laptop. Of course it may not reproduce?

From: davidskalinder notifications@github.com Sent: Wednesday, June 10, 2020 2:44 PM To: davidskalinder/mpeds-coder mpeds-coder@noreply.github.com Cc: PAMELA E OLIVER pamela.oliver@wisc.edu; Mention mention@noreply.github.com Subject: Re: [davidskalinder/mpeds-coder] Workaround and data collection for multiple-text-select deletion (#92)

Yes, the second dialog box opened after I clicked ok on the first dialog box

Hmm, that is strange, I can't get anything like that on my end. Are you around for a Skype call so you can show me on your screen?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/davidskalinder/mpeds-coder/issues/92#issuecomment-642218745, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADBJJ5NW2AV6QLM6Y5B2FJLRV7PBLANCNFSM4NY2B4SA.

davidskalinder commented 4 years ago

I guess so. I’ll turn on the laptop. Of course it may not reproduce?

True. Though I can still probably at least get the detailed description more easily...

davidskalinder commented 4 years ago

Okay, that was a useful Skype call.

The double-dialog behavior happened (at least in this instance) at the same time as the dead-delete-button behavior mentioned in https://github.com/davidskalinder/mpeds-coder/issues/92#issuecomment-642092780. This happened on @olderwoman's desktop computer but not on her laptop. Neither browser showed the behavior after switching to a git branch without this thread's commits and refreshing the page (even after switching back to the branch with this thread's commits); but the bad behavior persisted on a page that hadn't been reloaded (though it did persist after saving and reloading the event). So, tl;dr: it's evidently pretty hard to predict when this will happen.

We do know what is happening, though: instead of each duplicate entry getting one click event apiece assigned to it, the first entry gets two and the second doesn't get any. So something is going wrong with the way listeners are attached to (at least) duplicate elements.

In spite of all this though, the behavior shouldn't cause any real problems with data collection, just some confusion for coders (since we want them to not delete duplicate entries anyway); so my plan is to spend an hour or so looking into this and then give up.

davidskalinder commented 4 years ago

All right, I'm about 95% certain why those click events are getting assigned wrong. Here are the exciting details:

The ID for the HTML element that the click listeners get attached to is a concatenation of the string "del", the variable name, the position of the text, and the timestamp of when that bit of JS runs. Usually the timestamp will be different, but often it's very similar: the last digit (probably like milliseconds or something?) often only differs by one between two different elements; and sometimes by chance it'll be the same for two elements. When the timestamp is the same for duplicate elements, the IDs will be the same, which makes the HTML invalid; so the JS deals with the broken HTML as well as it can, assigning all events to the first instance of the ID.

All of which means that this behavior has probably been happening all along, but should be corrected by whatever we do to fix #89.

I'm going to look to see whether there's an easy short-term fix for this. If not, I vote we just live with it until #89 is fixed.

(EDIT: corrected the issue number)

davidskalinder commented 4 years ago

This is really pretty separate from the rest of this issue, so I'm going to open a new one for it.

davidskalinder commented 4 years ago

This one is mostly FAO @matthewsmorgan and @johnklemke...

All right, so the conclusion of #93 is that I don't think I should do it. So that leaves us at:

  1. All of this issue's functionality seems to be working as expected in the UI (modulo #93).
  2. We need to know if it will be easy in pass 2 to treat as suspicious the duplicated and possibly entirely wrong text-select entries this issue's functionality will sometimes create.

@matthewsmorgan and @johnklemke, I think you're best placed to answer that second point? So I'll wait to deploy this issue's functionality until I hear from y'all that it is indeed what we want to do (unless of course @olderwoman wants to preempt)...

Of course let me know if there's anything I can help clarify.

johnklemke commented 4 years ago

As I'm understanding this, I think it won't be too hard to handle this in the code I intend to write to substitute for an append query. David, we talked about this a bit at Monday's meeting, but let me recap (and open up for questions from others). I see the process of bringing the latest data from MAI into Access in a form ready for pass 2 as a two step process. First, we'll use a saved import specification to grab data from a .csv file and drop it into an Access table, doing a minimum of conversion, mostly taking the fields of interest into short or long text fields. Since the .csv file will always be a complete redump of all the MAI data, each time it gets imported into an Access table it'll go into a fresh version of the table, the old version deleted. One thing I like about this .csv to Access table process is that if there are errors, Access creates a table that logs them referencing row numbers in the .csv. But to make the data usable for the Access forms that are the guts of the Pass 2 process, the short and long text fields in the staging table need to get converted into the data types needed in the coder lines table that's part of the guts of the Pass 2 process. That's the second step. But I don't see a way to configure Access to run an append query that logs conversion errors like the import process does. It just reports that something like 59 fields could not be converted and the intended values were set to null. No freakin' bread crumbs, no handy way for a non-programmer to resolve the issues. So my current plan -- which I may well start on tomorrow -- is to run the equivalent of an append query in procedural code written in VBA. That way at each step of the process, I'll know what record from the staging table is in hand, and what record in the growing coder lines table is the target. When a special case (like a duplicate value) is detected, or when a conversion error occurs, I'll be able to log it into an ancillary table for logging any errors and weirdnesses in the append process. Bottom line -- yeah, I think we can do what's needed in the set up for pass 2, and I already had reasons to build the core of the code anyhow.

matthewsmorgan commented 4 years ago

Seeing this now. It sounds like this is mostly a question for John, who has to implement this in Pass 2. I'll work with whatever I'm given in the Access forms (frankly I didn't use the text select info a lot last time because a lot of the old data didn't have that anyway).

On Wed, Jun 10, 2020, 8:37 PM johnklemke notifications@github.com wrote:

As I'm understanding this, I think it won't be too hard to handle this in the code I intend to write to substitute for an append query. David, we talked about this a bit at Monday's meeting, but let me recap (and open up for questions from others). I see the process of bringing the latest data from MAI into Access in a form ready for pass 2 as a two step process. First, we'll use a saved import specification to grab data from a .csv file and drop it into an Access table, doing a minimum of conversion, mostly taking the fields of interest into short or long text fields. Since the .csv file will always be a complete redump of all the MAI data, each time it gets imported into an Access table it'll go into a fresh version of the table, the old version deleted. One thing I like about this .csv to Access table process is that if there are errors, Access creates a table that logs them referencing row numbers in the .csv. But to make the data usable for the Access forms that are the guts of the Pass 2 process, the short and long text fields in the staging table need to get converted into the data types needed in the coder lines table that's part of the guts of the Pass 2 process. That's the second step. But I don't see a way to configure Access to run an append query that logs conversion errors like the import process does. It just reports that something like 59 fields could not be converted and the intended values were set to null. No freakin' bread crumbs, no handy way for a non-programmer to resolve the issues. So my current plan -- which I may well start on tomorrow -- is to run the equivalent of an append query in procedural code written in VBA. That way at each step of the process, I'll know what record from the staging table is in hand, and what record in the growing coder lines table is the target. When a special case (like a duplicate value) is detected, or when a conversion error occurs, I'll be able to log it into an ancillary table for logging any errors and weirdnesses in the append process. Bottom line -- yeah, I think we can do what's needed in the set up for pass 2, and I already had reasons to build the core of the code anyhow.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/davidskalinder/mpeds-coder/issues/92#issuecomment-642353513, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4I3JV7VT6IPKMFGTC5HP3RWAYODANCNFSM4NY2B4SA .

olderwoman commented 4 years ago

Duplicated text has an extraordinarily small likelihood of being a problem in pass 2, in fact I think that likelihood is indistinguishable from zero. Pass 2 coder experiences mild annoyance at seeing the same text twice in a field. They can probably live through the experience with little loss of time or efficiency unless they forget it is a known bug and worry about needing to report the problem.

davidskalinder commented 4 years ago

Okay, sounds like this won't be much of a problem then. To clarify, I don't think this will really need any new pass 2 functionality (since I think pass 2 reviewers should already be able to see when a variable has multiple text-select entries?). But this change will make it impossible for pass 1 coders to delete duplicate entries even if they are mistakes, so all duplicate entries will have to be treated as possible mistakes in pass 2.

All that said, it sounds like I'm more worried about this than anyone else lol. So I'll go ahead and make this workaround live. Then I think we should give it a couple weeks in order to see how often duplicates occur before implementing a fix for #89, which will override the workaround.

davidskalinder commented 4 years ago

All right, this is now live in the production deployment. I haven't re-tested it there because I didn't want to create bad data, but it should be identical to what's in the testing deployment (and I can confirm that it doesn't break the main coding interface).

I'm going to submit a Pull Request so that @alexhanna and the gang can use this as well, but otherwise I think this is done. Thanks testing team!

davidskalinder commented 4 years ago

I'm reopening this since I realized I forgot about the data collection part. Here's the query that I think gives us the data we want:

/* Number of multiple entries of text-capture variables */
SELECT coder_id
  , event_id
  , variable
  , value
  , COUNT(id) AS n
  FROM coder_event_creator
  WHERE RIGHT(variable, 5) = "-text"
  AND timestamp > "2020-06-11 12:00"
  GROUP BY coder_id
  , event_id
  , variable
  , value
  , text
  HAVING n > 1
  ;

... and the results from our production database since the change went live 18 days ago:

+----------+----------+----------------+-----------+---+
| coder_id | event_id | variable       | value     | n |
+----------+----------+----------------+-----------+---+
|       14 |     6085 | named-org-text | 7-207-255 | 2 |
|       14 |     6133 | actor-text     | 0-118-143 | 2 |
|       14 |     6133 | location-text  | 0-61-86   | 2 |
|       17 |     6559 | time-text      | 1-160-170 | 2 |
+----------+----------+----------------+-----------+---+
4 rows in set (0.09 sec)

For context:

mysql> SELECT COUNT(*) FROM event_creator_queue WHERE coded_dt > "2020-06-11 12:00";
+----------+
| COUNT(*) |
+----------+
|      376 |
+----------+
1 row in set (0.00 sec)

So yeah, this has been awfully rare since we made this change: just 4 times out of 376 articles coded! And actually, 3 of those were in the first few hours after I made the change, so it's possible that coder 14 was just testing the new functionality.

However, we did tell our coders to watch out for this, so maybe they were being more careful? Peeking into the campus protests DB, they've had it happen a little more frequently since roughly when @alexhanna made the fix live: 10 times in about 395 articles, which is about 2.5%.

@alexhanna, did you tell your coders about this problem, or did you just accept the PR silently? If you did it silently, then we can probably draw a confidence interval around that 2.5% number and expect that about that many articles from the past are missing a piece of text-select data. If you told them, then I guess we should adjust that expectation somehow...

davidskalinder commented 4 years ago

Just for my reference, in case I need it later and since it's a pain to write out all those fields, here's the query to pull each row with a duplicate entry:

/* Rows with text-capture multi-entry variables */
SELECT a.id
  , a.coder_id
  , a.event_id
  , a.variable
  , a.value
  , a.timestamp
  FROM coder_event_creator AS a
  INNER JOIN
    (SELECT coder_id
       , event_id
       , variable
       , value
       , text
       , COUNT(id) AS n
       FROM coder_event_creator
       WHERE RIGHT(variable, 5) = "-text"
       GROUP BY coder_id
       , event_id
       , variable
       , value
       , text
       HAVING n > 1
     ) AS b
  ON a.coder_id = b.coder_id
  AND a.event_id = b.event_id
  AND a.variable = b.variable
  AND a.value = b.value
  AND a.text = b.text
  ;
alexhanna commented 4 years ago

I did not tell them about this issue.

davidskalinder commented 4 years ago

I did not tell them about this issue.

Hooray for methodological rigor!

So using your clean sample, there are 10 occurrences out of 395 articles. Since these are counts, I think this is a Poisson distribution? And apparently calculating CIs for Poisson isn't all that well defined unless n is big enough to pretend it's normal, which I think 395 should be... so that lets us use lambda +- 1.96 * sqrt(lambda / n), where lambda is the observed mean. I get from that a CI from about .0096 to .0410; so unless this sample is part of the unlucky 5%, lambda is between about .01 and .04. Methods and Stats prelim prep FTW!

So, tl;dr: this has probably been happening in about 1-4% of articles.

davidskalinder commented 4 years ago

Looking back at old campus protest data, it looks like these double-entry errors were almost always corrected by coders before the fix went in: there's only about one occurrence a month. That is bad, of course, since what the coders thought was a correction was actually a deletion.

So we should probably assume that that 1-4% number is a good estimate of the number of articles coded prior to early June 2020 that are erroneously missing a text-select entry. (Well, more strictly speaking, there are probably about .01-.04 text-selects missing per article.)

I'm not sure what the actions to be taken from here are, so I'll leave the issue open in case others (@olderwoman / @limchaeyoon / @matthewsmorgan ?) want to weigh in. But now I think we can worry about fixing #89 properly with a pretty good idea of how it's gone wrong in the past.

davidskalinder commented 4 years ago

A clarification from @alexhanna on this one: she did in fact inform the campus protest coders about this issue, so they created about .025 duplicate text-selects per article even knowing that it was a bad thing to do. (Not sure why that's so much higher than our rate under similar conditions, but hey.)

I can't remember whether I checked to see if the campus protesters' duplicates looked like intentional testing, so I'll do that. If they do look natural, then maybe that 1-4% should be viewed as a lower bound of what would happen without foreknowledge?

davidskalinder commented 4 years ago

I can't remember whether I checked to see if the campus protesters' duplicates looked like intentional testing, so I'll do that.

So these occurrences do indeed look like they are natural: the ten occurrences happened across four users, with hours between each occurrence.

Given that the campus protest coders were working under the same conditions as ours, I guess it makes the most sense to pool the samples for a rate of 14 occurrences over 771 articles; using the formula above that gets us to a CI of .013-.023. So perhaps that's slightly more optimistic, although we still don't know the rate for coders who aren't watching out for this problem.

davidskalinder commented 4 years ago

Outcome of the team discussion on this: "meh". Other coder errors almost certainly swamp this bug, and pass 2 coders have to reconcile them all anyway. So the plan for this is to fix it in #89 and move on with our lives.

Closing now.