Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 12 forks source link

Move to MySQL #673

Open Woseseltops opened 4 years ago

Woseseltops commented 4 years ago

In this issue I want to investigate what should happen for us to move to MySQL so we can get rid of #441 . This is what I could come up with.

Phase 1: preparation

  1. Request the database at C&CZ

Phase 2: migration

  1. Turn off Signbank
  2. Dump the data
  3. Change the settings to use the new database, and link it correctly
  4. Run migrate so the MySQL database has the correct structure
  5. Load the data

Expected problems:

@susanodd and @vanlummelhuizen , I really need your help here :)

susanodd commented 2 weeks ago

https://github.com/Signbank/Global-signbank/issues/1331#issuecomment-2410156648

vanlummelhuizen commented 2 weeks ago

In the light of moving to a database server, there are concerns that many API calls may (b)lock the SQLite database (#1331, #1332). Perhaps, for now, we could do some SQLite optimization as decribed in https://blog.pecar.me/sqlite-django-config. What do you think @Woseseltops ?

Sorry for missing your question last year @vanlummelhuizen ! This issue has ended up very low on my todo list in 2020, so no progress here. Given the funding situation for Signbank after 2024 is unclear, it's probably unwise to take up major new projects, so SQLite optimization is probably the better choice indeed; didn't know something like that is possible!

I tried the suggested optimizations locally by doing some parallel API calls to /dictionary/api_create_gloss/{datasetid}/. Unfortunately, it did not change anything. Just about the same amount of calls failed for both setups (with and without optimizations). The ratio of failures increased with the number of parallel calls. The failures happened here: https://github.com/Signbank/Global-signbank/blob/f4b02dcd4289690dfa075255e65ab7c7f8c33d0e/signbank/abstract_machine.py#L323-L327

susanodd commented 2 weeks ago

Regarding the errors, in code I wrote I wanted to include the errors in the json for the purpose of displaying them to the user.

The alternative would be to just do the Bad Request and Status that fails and not bother to report why. Can we report Transaction Failure?

????

susanodd commented 2 weeks ago

In the light of moving to a database server, there are concerns that many API calls may (b)lock the SQLite database (#1331, #1332). Perhaps, for now, we could do some SQLite optimization as decribed in https://blog.pecar.me/sqlite-django-config. What do you think @Woseseltops ?

Sorry for missing your question last year @vanlummelhuizen ! This issue has ended up very low on my todo list in 2020, so no progress here. Given the funding situation for Signbank after 2024 is unclear, it's probably unwise to take up major new projects, so SQLite optimization is probably the better choice indeed; didn't know something like that is possible!

I tried the suggested optimizations locally by doing some parallel API calls to /dictionary/api_create_gloss/{datasetid}/. Unfortunately, it did not change anything. Just about the same amount of calls failed for both setups (with and without optimizations). The ratio of failures increased with the number of parallel calls. The failures happened here:

https://github.com/Signbank/Global-signbank/blob/f4b02dcd4289690dfa075255e65ab7c7f8c33d0e/signbank/abstract_machine.py#L323-L327

The part above that except in the try is a huge headache of updates.

  1. lemma objects are created for each language
  2. a user affiliation object created
  3. annotation objects created for each language
  4. sense objects are created, including sense translation objects
  5. a gloss history object is created

It's a mess if that fails. It's also going to cause problems if the server is bombarded with create gloss commands but has not finished previous commands. The constraints also need to be checked for the above. We can't just do a "Bad Request" after some of the commands in the try are partly done but the database gets locked.

susanodd commented 2 weeks ago

On rebooting the server this information might be useful:

python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds

Probably the API requests should also adhere to the "graceful operations" and not bombard the server with 5 per minute.

susanodd commented 2 weeks ago

I tried the suggested optimizations locally by doing some parallel API calls to /dictionary/api_create_gloss/{datasetid}/. Unfortunately, it did not change anything. Just about the same amount of calls failed for both setups (with and without optimizations). The ratio of failures increased with the number of parallel calls. The failures happened here:

https://github.com/Signbank/Global-signbank/blob/f4b02dcd4289690dfa075255e65ab7c7f8c33d0e/signbank/abstract_machine.py#L323-L327

The code "above" the except is a giant block of necessary operations to create a new gloss.

susanodd commented 2 weeks ago

I thought the primary difference between Sqlite and MySQL was that Sqlite locks the entire database.

For gloss creation we can identify which tables are being updated.

susanodd commented 2 weeks ago

Atomicity question:

The fact that gloss creation needs to successfully create numerous objects (as shown above), and that the individual methods are also atomic, leads to nested atomicity. The fact that nested methods are also atomic means that objects have been created and used in the creation of other objects. (For example a Lemma object needs to be saved before the Lemma Translation objects can be created and saved. Then the Gloss should be created because it needs the Lemma object. The Annotation Translation objects need the Gloss object. The Sense Translation objects need ??? Etc etc. It's an entire spider web of object creation that all needs to succeed, or be rolled back if any one step fails ????

How to do this? Implement a "lock database" operation rather than the "atomic" blocks? Because the "atomic" blocks also refer to other operations in other files and other models, then it's not clear how "atomic" works, nor what happens if it fails. (It might be that the operations are queued up.)

susanodd commented 4 days ago

I'm investigating the "upload_to" function that gets frozen in the video creation. (Looking for an answer why the API reported that the videos were not uploaded but they were in fact uploaded. It just took a really long time to upload them #1341 )

This is marginally relevant (from here about frozen):

https://stackoverflow.com/questions/62379876/django-how-to-debug-a-frozen-save-operation-on-a-queryset-object

It is very probable that Django is waiting for a response from database server and it is a configuration problem, not a problem in the Python code where it froze. It is better to check and exclude this possibility before debugging anything. For example it is possible that a table is locked or an updated row is locked by another frozen process and the timeout in the database for waiting for end of lock is long and also the timeout of Django waiting for the database response is very long or infinite.