Move to MySQL - Githubissues

Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.

http://signbank.cls.ru.nl

BSD 3-Clause "New" or "Revised" License

19 stars 12 forks source link

Move to MySQL #673

Open Woseseltops opened 4 years ago

Woseseltops commented 4 years ago

In this issue I want to investigate what should happen for us to move to MySQL so we can get rid of #441 . This is what I could come up with.

Phase 1: preparation

Request the database at C&CZ

Phase 2: migration

Turn off Signbank
Dump the data
Change the settings to use the new database, and link it correctly
Run migrate so the MySQL database has the correct structure
Load the data

Expected problems:

?

@susanodd and @vanlummelhuizen , I really need your help here :)

susanodd commented 2 weeks ago

https://github.com/Signbank/Global-signbank/issues/1331#issuecomment-2410156648

vanlummelhuizen commented 2 weeks ago

In the light of moving to a database server, there are concerns that many API calls may (b)lock the SQLite database (#1331, #1332). Perhaps, for now, we could do some SQLite optimization as decribed in https://blog.pecar.me/sqlite-django-config. What do you think @Woseseltops ?

Sorry for missing your question last year @vanlummelhuizen ! This issue has ended up very low on my todo list in 2020, so no progress here. Given the funding situation for Signbank after 2024 is unclear, it's probably unwise to take up major new projects, so SQLite optimization is probably the better choice indeed; didn't know something like that is possible!

I tried the suggested optimizations locally by doing some parallel API calls to /dictionary/api_create_gloss/{datasetid}/. Unfortunately, it did not change anything. Just about the same amount of calls failed for both setups (with and without optimizations). The ratio of failures increased with the number of parallel calls. The failures happened here: https://github.com/Signbank/Global-signbank/blob/f4b02dcd4289690dfa075255e65ab7c7f8c33d0e/signbank/abstract_machine.py#L323-L327

susanodd commented 2 weeks ago

Regarding the errors, in code I wrote I wanted to include the errors in the json for the purpose of displaying them to the user.

The alternative would be to just do the Bad Request and Status that fails and not bother to report why. Can we report Transaction Failure?

????

susanodd commented 2 weeks ago

In the light of moving to a database server, there are concerns that many API calls may (b)lock the SQLite database (#1331, #1332). Perhaps, for now, we could do some SQLite optimization as decribed in https://blog.pecar.me/sqlite-django-config. What do you think @Woseseltops ?

Sorry for missing your question last year @vanlummelhuizen ! This issue has ended up very low on my todo list in 2020, so no progress here. Given the funding situation for Signbank after 2024 is unclear, it's probably unwise to take up major new projects, so SQLite optimization is probably the better choice indeed; didn't know something like that is possible!

I tried the suggested optimizations locally by doing some parallel API calls to /dictionary/api_create_gloss/{datasetid}/. Unfortunately, it did not change anything. Just about the same amount of calls failed for both setups (with and without optimizations). The ratio of failures increased with the number of parallel calls. The failures happened here:

https://github.com/Signbank/Global-signbank/blob/f4b02dcd4289690dfa075255e65ab7c7f8c33d0e/signbank/abstract_machine.py#L323-L327

The part above that except in the try is a huge headache of updates.

lemma objects are created for each language
a user affiliation object created
annotation objects created for each language
sense objects are created, including sense translation objects
a gloss history object is created

It's a mess if that fails. It's also going to cause problems if the server is bombarded with create gloss commands but has not finished previous commands. The constraints also need to be checked for the above. We can't just do a "Bad Request" after some of the commands in the try are partly done but the database gets locked.

susanodd commented 2 weeks ago

On rebooting the server this information might be useful:

python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds

Probably the API requests should also adhere to the "graceful operations" and not bombard the server with 5 per minute.

susanodd commented 2 weeks ago

I tried the suggested optimizations locally by doing some parallel API calls to /dictionary/api_create_gloss/{datasetid}/. Unfortunately, it did not change anything. Just about the same amount of calls failed for both setups (with and without optimizations). The ratio of failures increased with the number of parallel calls. The failures happened here:

https://github.com/Signbank/Global-signbank/blob/f4b02dcd4289690dfa075255e65ab7c7f8c33d0e/signbank/abstract_machine.py#L323-L327

The code "above" the except is a giant block of necessary operations to create a new gloss.

Will it help if the entire try-except block is put inside an "atomic" clause?
Or should the "atomic" be the code in the "try" clause?
The individual methods that appear inside the "try" are all themselves "atomic" (elsewhere).
(There is a bunch of nested atomic code here that all needs to be completed, and probably rolled back.
That's why there is a previous check on the constraints, before the actual update method in which this appears. So if the "try" is failing, it should not be because of constraints. Except your optimization might be interleaving them in such a way that the constraint checks are getting interleaved with actually updating?
Can the entire API method be made atomic? In spite of that it calls a bunch of other methods that are coded as "atomic". And that it checks constraints and can fail anyway. So it fails if the constraint checks on ANY of its arguments fail, and it fails if it cannot perform all those sub-methods atomically.

susanodd commented 2 weeks ago

I thought the primary difference between Sqlite and MySQL was that Sqlite locks the entire database.

For gloss creation we can identify which tables are being updated.

susanodd commented 2 weeks ago

Atomicity question:

The fact that gloss creation needs to successfully create numerous objects (as shown above), and that the individual methods are also atomic, leads to nested atomicity. The fact that nested methods are also atomic means that objects have been created and used in the creation of other objects. (For example a Lemma object needs to be saved before the Lemma Translation objects can be created and saved. Then the Gloss should be created because it needs the Lemma object. The Annotation Translation objects need the Gloss object. The Sense Translation objects need ??? Etc etc. It's an entire spider web of object creation that all needs to succeed, or be rolled back if any one step fails ????

How to do this? Implement a "lock database" operation rather than the "atomic" blocks? Because the "atomic" blocks also refer to other operations in other files and other models, then it's not clear how "atomic" works, nor what happens if it fails. (It might be that the operations are queued up.)

susanodd commented 4 days ago

I'm investigating the "upload_to" function that gets frozen in the video creation. (Looking for an answer why the API reported that the videos were not uploaded but they were in fact uploaded. It just took a really long time to upload them #1341 )

This is marginally relevant (from here about frozen):

https://stackoverflow.com/questions/62379876/django-how-to-debug-a-frozen-save-operation-on-a-queryset-object

It is very probable that Django is waiting for a response from database server and it is a configuration problem, not a problem in the Python code where it froze. It is better to check and exclude this possibility before debugging anything. For example it is possible that a table is locked or an updated row is locked by another frozen process and the timeout in the database for waiting for end of lock is long and also the timeout of Django waiting for the database response is very long or infinite.

Previous