IBM / taxinomitis

Source code for Machine Learning for Kids site
https://machinelearningforkids.co.uk
Apache License 2.0
147 stars 141 forks source link

Loss of site database #240

Closed dalelane closed 5 years ago

dalelane commented 5 years ago

TLDR : A database problem means the Machine Learning for Kids site has lost some data:


What happened?

Shortly before 1am on Sat 24th Aug 2019, the main MySQL database that holds the data for the Machine Learning for Kids site became unavailable. I've not been able to regain access to it.

Why does this mean data from 23rd Aug is lost?

I've restored the site database from a backup.

I take backups of the site DB every 24 hours at 1am. This means the DB failure happening just before 1am was at the worst possible time - I lost over 23 hours of data.

Almost anything stored on 23rd August will be missing from that backup.

What sort of data from 23rd Aug will be affected?

Changes to projects - if you deleted a project on 23rd August, you'll see it's back again. If you created a project on 23rd August, you will have lost it.

Changes to training data - training data created on 23rd August will be lost

Changes to IBM Watson API keys - if you stored an API key on 23rd August, that will be lost. (The actual API key on IBM Cloud will be unaffected - the credentials just will have been lost from Machine Learning for Kids).

Note: Data relating to users is stored separately and so will be unaffected by this.

Why does this mean sound projects are lost?

My first two attempts to restore from the backup were unsuccessful - each time, the restored databases ran for less than an hour before failing in a similar way.

I don't know why this is happening, but I have a suspicion that the soundtraining table getting so large was a contributor. This was something I had identified as a problem earlier this week but my fix wasn't ready.

For my third attempt to restore from the backup, I excluded the soundtraining table from the restore. So far, the database has remained available, which reinforces the idea that it might have been the culprit.

In conjunction with restoring the database, I've deployed the fix I had been working on that reduces the load on the soundtraining table. That should prevent the table getting so large in future.

My intention was to migrate the existing data to the new store when I deployed that fix. After two failed attempts to restore the backup with the sound training data, I've decided to abandon that for now so that I can get the rest of the site working again. This means I've reluctantly had to lose all sound projects created before 24th Aug 2019.

I will continue trying to retrieve the sound project data from the backup and will restore those projects if I can.

dalelane commented 5 years ago

I have not been able to restore the old DB so am now abandoning the attempt. The restored backup is now the main site DB going forwards. :-(

andreeas26 commented 5 years ago

Hi! Training does not work for sound projects. Not even the speech to text Extensions from Scratch. Do you think night be the same issue?

dalelane commented 5 years ago

@andreeas26 When did you create the project?

andreeas26 commented 5 years ago

Today. We've tried with multiple students accounts. When refreshing the train page the model disappears.

dalelane commented 5 years ago

Refreshing the training page needing you to retrain the model is normal and expected. Are there any other issues?

andreeas26 commented 5 years ago

It did not happend with other types of projects. Also the speech to text extension do not work. It does not recognize anything I say.

andreeas26 commented 5 years ago

When I have time I will try again and see if I can give you more details.

dalelane commented 5 years ago

If there is a Scratch project that isn't working, I'd be happy to investigate if you send me a copy of the Scratch project file.