cul-2016 / quiz

11 stars 4 forks source link

Live Quiz not Activating for Students #720

Open stianr opened 5 years ago

stianr commented 5 years ago

I have just sat in a - slightly excruciating - lecture where over 100 students tried to take part in the quiz, but only 30-40 gave answers to the questions. The majority of students in the lecture managed to log in fine, access the appropriate module, but then on the module overview screen, the 'Join Live Quiz' button did not appear. For others, they were able to join the live quiz and get to the holding page, but the quiz didn't start for them. Loads today were weren't beyond limits on dynos or memory, and response times were better than usual.

image

I asked the class and they all said that they had managed to participate fine last term, so it looks like something has changed in recent weeks. So we probably need to look at a couple of options.

  1. Something that has been deployed since the start of December is making the app fail. Is there any way the solution to #712 could have led to this problem?
  2. Something has changed in browser functionality which means the method for checking whether a quiz is live no longer works.

Other things to note:

The lecturer said that all the students who came to her last week were using iPhones. I didn't check devices, so it could be that it's device specific. That said, lots of laptop users also had the problem (I know at least one of them was a mac, but I don't know about the others.

I can't replicate the problem with my android phone, which may be down to the OS/browser.

stianr commented 5 years ago

Wait, I've replicated the problem reliably. Open a browser window with Chrome 72.0.3626.121 on Android. Go to the module overview page. In another browser log in as a lecturer and start running a quiz for that module. The 'Join Live Quiz' button does not appear.

I was about to say the the problem did not occur for Chrome Desktop, but I realised I was not running the latest version. Once it updated, exactly the same problem occurred again.

(I didn't encounter any problems with Firefox, which is why I was struggling to replicate the problem.)

This suggests that the latest release of Chrome has broken the polling for live quiz. So at least we know what the problem is. Have looked to see whether there are any obvious changes in the latest versions of Chrome that might cause the problem but most of the changelog is beyond me...

But the worrying bottom line is that we can't use Quodl again until this issue is resolved.

stianr commented 5 years ago

Is it possible to investigate the cause? If there's a quick fix, brilliant. If not, we probably need to discuss what options we have.

Danwhy commented 5 years ago

@stianr I'm hoping this can be resolved by just updating all of the front end dependencies, so I'll start with that, then investigate further as needed. Have you had the same problem on staging too?

stianr commented 5 years ago

Thanks @Danwhy. Looking at it today, I can't replicate the problem on either staging or the main app today, so I'm not sure exactly what the locus of the problem is. Maybe it's not browser based after all.

Danwhy commented 5 years ago

:+1: I tried and noticed the same thing. I'm going to update the socket.io dependency anyway, as there's a more recent major release that should have some performance improvements.

stianr commented 5 years ago

Good plan. Fingers crossed this works. It's frustrating when it's so hard to identify the cause of the problem.

stianr commented 5 years ago

Unfortunately this isn't working reliably. The problem seems really intermittent, which is probably why it's so hard to identify. I've been through and logged in and run a quiz as a lecturer in one browser, and then tried to join as a student in another. I've probably done this 50 times now. Around half the time it doesn't work.

Things I've noticed:

  1. Perhaps a third or a half of the time, when I log in as a lecturer and then click on one of the modules, I get an "Oops! Nothing to see here". Going from the module list page, clicking on one of the modules 20 times in a row, around ten times I got onto the module page with the quizzes and the other ten I got the error. No pattern to it as far as I could see. This was for exactly the same module.
  2. As a student fairly frequently when I click on a module, the URL changes (from https://app.quodl.co.uk/#/dashboard to https://app.quodl.co.uk/#/TSST/student) but the page does not. Refreshing the page, or navigating away and coming back brings the appropriate page up.
  3. It is possible that the failure rate is lower than I'm estimating because sometimes the wait time between a lecturer starting a quiz running and the 'Join Live Quiz' button appearing is very long - I counted 40 seconds in one instance. So I may have shut the browser down before the button activated. But I have left it for several minutes on other occasions, with no button appearing.

Will investigate further this afternoon

iteles commented 5 years ago

@stainr Thanks for the feedback. Can you also please have a think about when the last time it was working reliably was?

stianr commented 5 years ago

Hi @iteles, I've spoken to the lecturer and looked at the number of respondents for each of the recent quizzes.

There have always been occasional "Nothing to see here" issues, and last term a few students had issues (#712) where they couldn't log in, but the major issues seem to have arisen over the past month. We don't have any dates on the data I can download, but looking at the answer data, last term the number of students taking each quiz was usually 150-190. This term we have three quizzes with 100-140 students responding, which looks about right, and two with 39 students responding, which correspond to those where issues were reported.

The lecturer says she had a fair few of the 'Nothing to see here' issues when accessing a module while she's been using it over this term (from the end of January). But matching dates with number of responses, the last quiz with 100 respondents in was run on Monday 11 Feb, which seemed to go fine. The first quiz which had major problems was run on Monday 11 March. (Between those times was a class test and reading week so Quodl didn't get any substantial use.)

I don't think anything new was deployed in that time, which is why I wondered if it was something around browser updates. But I can recreate the problem intermittently with both Firefox and Chrome, so that makes it less likely (given Firefox doesn't use Chromium as far as I'm aware). The only other potential things I could think of were (a) something cumulative - e.g. database or similar has reached a capacity which now leads to issues, (b) something in the peripheral setup that has changed, leading to this issue (but the only changes I can think of are around the SSL issue and I reduced down to one dyno over the Christmas break and put it up again to two when it got overloaded earlier in the term), (c) something behavioural - the current lecturer is doing something different now which is making a problem that has always been there beneath the surface much more significant for users. Hope you have some better insights than I've managed...

Danwhy commented 5 years ago

@stianr I've been looking into this further today and yesterday, and think I may have discovered something which could be the root cause of these issues.

The forum app we built in October seems to be offline at the moment (I'm not sure what caused this yet, but it won't restart). I've searched through the logs trying to figure out when it happened, and it seems to have been around January 16th.

Now, what I think has been happening is that either a student/lecturer has created a new account, a lecturer has created a new module, or a student is joining a new module. In all of these cases, the quiz app would send a request to the forum trying to create either a new user (student/lecture), or a new topic (module), or to add user privileges for a topic. When this happens, because the forum is down, the request fails, taking a long time in doing so. Combine this with a large number of students trying to log in/access a quiz, and it's likely that this would slow down the server considerably, meaning most of the students' requests would time out.

It would be useful to know if the issues you discovered when testing this do not occur on the staging site, as this is connected to the staging forum, which is still online.

If this does turn out to be the problem, I'm not sure what you'd like to do about it. The easiest and safest option would be to remove the connection between the app and the forum for now. Alternatively, I could try to fix the forum, but as I'm unsure what caused the problem in the first place (it could be that the database we're using for that, which is a free version, has filled up), there's a chance it could happen again.

stianr commented 5 years ago

Hi @Danwhy, thanks for looking into this. On the staging site, I still get the 'Oops, nothing to see here' error repeatedly when selecting an existing module (I just got it five times in a row). But I'm not seeing the quiz-not-activating issue - though so far I've only spent a couple of minutes on it; I'll look further

The forum was a useful proof of concept, but as it is not actively being used now, and will need further work in order to be usable, it's completely fine to remove the connection.

stianr commented 5 years ago

The Heroku metrics don't show any timeouts when this issue occurs, but I'm not sure whether they'd capture the kinds of timeout that might be driving the problem.

stianr commented 5 years ago

I also can't work out why if it's a load-based timeout, students have issues joining a live quiz, but once it's running we don't see students getting stuck on a question or crashing out - it all seems to work fine once they're in the quiz. (According to reports from lecturers and students anyway...) Though not sure if that's a red herring here...

(I guess it could be that if when loads of students try to log on, it times out for some of them, which means they then can't take part at all, but those for whom it didn't can carry on with no problems as the load is then lower. So yeah, may be a red herring...)

Danwhy commented 5 years ago

@stianr The app with the forum removed has been deployed to staging. If you could have a quick check that nothing is broken, then we can deploy to production for a real test.

stianr commented 5 years ago

Great - thanks @Danwhy. I've had a go on the staging version and it seems fine, so let's deploy and see whether it fixes the problem.

Danwhy commented 5 years ago

@stianr Now deployed to production

stianr commented 5 years ago

Hi @Danwhy, alas it looks like the problem's still there. I've recorded my screen as the problem arises in case that helps:

http://reimers.co.uk/Quodl.mp4

Lecturer panel (Firefox) on the left, student (Chrome) on the right

0:00 Works fine 1:00: Log in with new private windows 1:30: Three "Oops nothing to see here" followed by it loading the module page correctly 1:40: Lecturer activates quiz; Student doesn't see join quiz button 2:20: Lecturer restarting quiz doesn't help 2:30: Student refreshing page doesn't help 2:50: Student logging out and in again does work

Danwhy commented 5 years ago

@stianr I've now fixed the 'Nothing to see here' issue, which I think was hopefully the same problem as the live quiz button not showing.

Again, if you could check staging is ok, I'll deploy to production.

I think that this has been an intermittent issue for a while, and that the forum going down created an additional load on the server, which has lead to the increase of users not being able to access anything that's been happening since February. Hopefully with all of these issues resolved, there won't be any more problems.

stianr commented 5 years ago

Thanks @Danwhy - that sounds promising, and fits with what we've seen. It seems to run fine on staging, so please deploy when ready.

Danwhy commented 5 years ago

@stianr Done

stianr commented 5 years ago

@Danwhy I've tried it a few times and not had an problems. Will test more over the weekend, but fingers crossed...

stianr commented 5 years ago

Sorry @Danwhy I spoke too soon. I can recreate the previous problem under fairly similar circumstances to before, using two private windows:

http://reimers.co.uk/Quodl2.mp4

1:05: Live quiz button does not appear in student window when quiz is activated in lecturer window 1:50: Logging out and in again leads to a new problem where module can't be accessed - you can't see the mouse clicks on the recording, but you can see the URL bar - when I click on one of the modules it changes to the right URL for the module, but doesn't load the page.

It had worked fine for a few tests, then in the student view I got an 'Oops nothing to see here', and then failed to join the quiz, even when I logged in again in a private window. Not sure if it's causal, or just symptoms of the same underlying problem.

Sorry to be the bearer of bad news...

Danwhy commented 5 years ago

@stianr I am having some trouble reproducing the problem, but I've just deployed a small change that I hope will fix it. Would you mind testing again?

And does it only happen in private browsing? And only on production, not staging?

stianr commented 5 years ago

I've found it happening in non-private browsing on production just now. I've definitely found it easier to recreate the problem on production than on staging. On both staging and production there are now issues where buttons become sticky - pressing them doesn't do anything the first time but does the second, or occasionally pressing them has no effect at all even when doing it a dozen times over tens of seconds. Will test more and see whether the main button not appearing issue occurs on staging.

iteles commented 5 years ago

@stianr Were you able to do any more testing over the weekend?

stianr commented 5 years ago

@iteles I did a bit more, with similar results - it seems more reliable on staging than production. I'll put some time aside tomorrow to test both more systematically.

stianr commented 5 years ago

Hi again @iteles, I've been doing a fair bit of testing today. It's been more of the same - I haven't been able to reproduce the problem on the staging version, but the production version has been pretty reliable compared to when I last tested it. I have probably tried the production version 100 times, and the no join live quiz button issue has only happened around 5 times. I can't see any logic to when it's occurring. I did exactly the same steps as I did in the video above, and there was no problem this time.

So I'm a bit stuck now - I can't reproduce the problem reliably, so there's not much that can be fixed at the moment, but there's every chance it will occur again at some random point, which makes it hard to promote Quodl to other potential users. Not quite sure what to do now, particularly as we're at the end of term, so won't have any more opportunity to do major testing at City until September. Any thoughts?