Running `check` Command Too Soon After Stop Reports Damaged Database

SE2Dev commented 6 months ago

In cases where the check command is run very shortly after the stop command, the main database will be reported as damaged:

Checking the PMS databases
Killed
Check complete.  PMS main database is damaged.
Check complete.  PMS blobs database is OK.

The Killed line is particularly noteworthy as it only seems to show up when this issue is occurring.

Additionally running the automatic command instead of explicitly running check can result in additional weirdness:

Enter command # -or- command name (4 char min) : automatic

Automatic Check,Repair,Index started.

Checking the PMS databases
Killed
Check complete.  PMS main database is damaged.
Check complete.  PMS blobs database is OK.

Exporting current databases using timestamp: 2024-04-28_18.39.11
Exporting Main DB
Killed
Error 137 from Plex SQLite while exporting com.plexapp.plugins.library.db
Could not successfully export the main database to repair it.  Please try restoring a backup.
Repair failed. Automatic mode cannot continue. Please repair with individual commands

Upon attempting to run automatic the first time, and having it fail, all subsequent calls to check and automatic trigger the same issue(s), and requires manual intervention to restore the database (as replace won't work either)

Enter command # -or- command name (4 char min) : replace

Checking the PMS databases
Killed
Check complete.  PMS main database is damaged.
Check complete.  PMS blobs database is OK.
Checking for a usable backup.
Database backups available are:  2024-04-26 2024-04-23 2024-04-20 2024-04-17
Checking database 2024-04-26
Killed
Checking database 2024-04-23
Killed
Checking database 2024-04-20
Killed
Checking database 2024-04-17
Killed
Error.  No valid matching main and blobs database pairs.  Cannot replace.

ChuckPa commented 6 months ago

Need to know the details of the environment this is running in. (Distro & whether a container, VM or native)

The "Killed" line is because Plex SQLite is being killed. Getting killed will return a failed status code and be interpreted as "Damaged DB" ( $status != 0 )

Everything you're showing me here tells me you can't involve Plex SQLite .

SE2Dev commented 6 months ago

Oops - sorry, it's the standard Docker image.

Somewhere during the process of writing this post I seem to have caused my database to become actually corrupt, so now I'm working through the process of restoring it.

ChuckPa commented 6 months ago

I'll be around.

There have never been reported failures of Plex SQLite in my script unless there are corresponding failures in PMS.

Please let me know what you find out ?

SE2Dev commented 6 months ago

Even after restoring the backup I made (before starting PlexDBRepair.sh the first time), I'm still getting the following error upon restarting the container:

plex  | Error: Unable to set up server: sqlite3_statement_backend::loadOne: attempt to write a readonly database (N4soci10soci_errorE)
plex  | Stopping Plex Media Server.
plex  | kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec

I did double check the permissions for com.plexapp.plugins.library.db and they looked good.

Update: Apparently the permissions for the database files got corrupted somehow during the process of messing with DBRepair.sh (or afterwards), after re-applying the permissions that the owning folder had, everything seems to be working again.

ChuckPa commented 6 months ago

good

can you run it normally now?

SE2Dev commented 6 months ago

Yes, the initial issue I mentioned still occurs though. Is it not waiting for the Plex SQLite process to end before returning from the stop command?

Select

  1 - 'stop'      - Stop PMS.
  2 - 'automatic' - Check, Repair/Optimize, and Reindex Database in one step.
  3 - 'check'     - Perform integrity check of database.
  4 - 'vacuum'    - Remove empty space from database without optimizing.
  5 - 'repair'    - Repair/Optimize databases.
  6 - 'reindex'   - Rebuild database database indexes.
  7 - 'start'     - Start PMS

  8 - 'import'    - Import watch history from another database independent of Plex. (risky).
  9 - 'replace'   - Replace current databases with newest usable backup copy (interactive).
 10 - 'show'      - Show logfile.
 11 - 'status'    - Report status of PMS (run-state and databases).
 12 - 'undo'      - Undo last successful command.

 21 - 'prune'     - Prune (remove) old image files (jpeg,jpg,png) from PhotoTranscoder cache.
 42 - 'ignore'    - Ignore duplicate/constraint errors.

 88 - 'update'    - Check for updates.
 99 - 'quit'      - Quit immediately.  Keep all temporary files.
      'exit'      - Exit with cleanup options.

Enter command # -or- command name (4 char min) : stop

Stopping PMS.
Stopped PMS.

Select

  1 - 'stop'      - Stop PMS.
  2 - 'automatic' - Check, Repair/Optimize, and Reindex Database in one step.
  3 - 'check'     - Perform integrity check of database.
  4 - 'vacuum'    - Remove empty space from database without optimizing.
  5 - 'repair'    - Repair/Optimize databases.
  6 - 'reindex'   - Rebuild database database indexes.
  7 - 'start'     - Start PMS

  8 - 'import'    - Import watch history from another database independent of Plex. (risky).
  9 - 'replace'   - Replace current databases with newest usable backup copy (interactive).
 10 - 'show'      - Show logfile.
 11 - 'status'    - Report status of PMS (run-state and databases).
 12 - 'undo'      - Undo last successful command.

 21 - 'prune'     - Prune (remove) old image files (jpeg,jpg,png) from PhotoTranscoder cache.
 42 - 'ignore'    - Ignore duplicate/constraint errors.

 88 - 'update'    - Check for updates.
 99 - 'quit'      - Quit immediately.  Keep all temporary files.
      'exit'      - Exit with cleanup options.

Enter command # -or- command name (4 char min) : check

Checking the PMS databases
Killed
Check complete.  PMS main database is damaged.
Check complete.  PMS blobs database is OK.

Select

  1 - 'stop'      - Stop PMS.
  2 - 'automatic' - Check, Repair/Optimize, and Reindex Database in one step.
  3 - 'check'     - Perform integrity check of database.
  4 - 'vacuum'    - Remove empty space from database without optimizing.
  5 - 'repair'    - Repair/Optimize databases.
  6 - 'reindex'   - Rebuild database database indexes.
  7 - 'start'     - Start PMS

  8 - 'import'    - Import watch history from another database independent of Plex. (risky).
  9 - 'replace'   - Replace current databases with newest usable backup copy (interactive).
 10 - 'show'      - Show logfile.
 11 - 'status'    - Report status of PMS (run-state and databases).
 12 - 'undo'      - Undo last successful command.

 21 - 'prune'     - Prune (remove) old image files (jpeg,jpg,png) from PhotoTranscoder cache.
 42 - 'ignore'    - Ignore duplicate/constraint errors.

 88 - 'update'    - Check for updates.
 99 - 'quit'      - Quit immediately.  Keep all temporary files.
      'exit'      - Exit with cleanup options.

Enter command # -or- command name (4 char min) : check

Checking the PMS databases
Check complete.  PMS main database is OK.
Check complete.  PMS blobs database is OK.

Select

  1 - 'stop'      - Stop PMS.
  2 - 'automatic' - Check, Repair/Optimize, and Reindex Database in one step.
  3 - 'check'     - Perform integrity check of database.
  4 - 'vacuum'    - Remove empty space from database without optimizing.
  5 - 'repair'    - Repair/Optimize databases.
  6 - 'reindex'   - Rebuild database database indexes.
  7 - 'start'     - Start PMS

  8 - 'import'    - Import watch history from another database independent of Plex. (risky).
  9 - 'replace'   - Replace current databases with newest usable backup copy (interactive).
 10 - 'show'      - Show logfile.
 11 - 'status'    - Report status of PMS (run-state and databases).
 12 - 'undo'      - Undo last successful command.

 21 - 'prune'     - Prune (remove) old image files (jpeg,jpg,png) from PhotoTranscoder cache.
 42 - 'ignore'    - Ignore duplicate/constraint errors.

 88 - 'update'    - Check for updates.
 99 - 'quit'      - Quit immediately.  Keep all temporary files.
      'exit'      - Exit with cleanup options.

Enter command # -or- command name (4 char min) :

ChuckPa commented 6 months ago

The test for whether PMS is running or not occurs BEFORE the check starts.

 99 - 'quit'      - Quit immediately.  Keep all temporary files.
      'exit'      - Exit with cleanup options.

Enter command # -or- command name (4 char min) : check

Checking the PMS databases
Killed
Check complete.  PMS main database is damaged.
Check complete.  PMS blobs database is OK.

Select

Something in your environment is still killing -- OR -- Restarting because of Health check failing.

Whose image are you using? Which host and distro ?

SE2Dev commented 6 months ago

Image: plexinc/pms-docker:latest (currently running PMS 1.40.2.8395) Host: Synology running Linux kernel 4.4.302+

ChuckPa commented 6 months ago

are you running it from the SSH or scheduled tasks ?

SE2Dev commented 6 months ago

The container itself is run via Docker Compose, but the DBRepair script is being run by attaching to the container shell (so essentially SSH) and installing it into /tmp. When I'm done, I usually re-create the container to remove any lingering files.

ChuckPa commented 6 months ago

I will need walk this through the your steps. Never known anyone to try things with Docker attach. Most do a Docker exec

Since HDR became standard on DSM, I don't know who still uses docker on the machines. (cutting out Docker removes more overhead from those limited CPUs)

SE2Dev commented 6 months ago

Never known anyone to try things with Docker attach. Most do a Docker exec

Sorry, yes - I attached to the container via the Docker extension in VSCode ("Attach Shell"), but it looks like it's literally just running docker exec -it [container id] bash. At which point I would do:

cd /tmp
apt-get update && apt-get install -y wget
wget https://github.com/ChuckPa/PlexDBRepair/releases/download/v1.05.02/DBRepair.sh
chmod +x ./DBRepair.sh
./DBRepair.sh

Since HDR became standard on DSM, I don't know who still uses docker on the machines. (cutting out Docker removes more overhead from those limited CPUs)

I'm not really sure what you mean by HDR, are you talking about support for HDR media in Plex via the native Plex Media Server package? I haven't run into any major issues running PMS via Docker as long as the correct devices are provided.

ChuckPa commented 6 months ago

"HDR" == HEVC HDR with tonemappping support as native.
I'm also the Plex engineer who supports Synology and QNAP systems

Also, be advised, DBRepair has its own "update" capability. You no longer need to update manually. (88 - UPDATE command)

I'll setup a container and see what happens.

SE2Dev commented 6 months ago

I'm also the Plex engineer who supports Synology and QNAP systems

That's probably one of the many places I recognize you from.

Also, be advised, DBRepair has its own "update" capability. You no longer need to update manually. (88 - UPDATE command)

I didn't have the script installed at all before. I had deleted and recreated the container to clear out any temporary files (including DBRepair.sh).

For what it's worth, I did just confirm that this does happen running the same container on a regular Debian host. You just seem to need a somewhat sizable database size (my blobs and main databases are about 500mb each).

ChuckPa commented 6 months ago

I just ran this on 1GB databases without issue.

I will do my best to figure out why it's (Plex SQLite) failing for you.

This might be entirely a Plex issue with the latest version (would not be surprised)

SE2Dev commented 6 months ago

I can confirm that this has been happening for at least the last few "released to everyone" versions, as I originally used this project to verify the database after one of the recent updates triggered migrations that required several steps to revert (just to make sure that the upgrade had worked properly). (I had seen some spooky console output during the server upgrade & "Optimize Database" run, and wanted to make sure my database wasn't corrupt).

Edit: I think it was PMS 1.40.0.7775 where I originally noticed this behavior in PlexDBRepair.

ChuckPa commented 6 months ago

1.40.0.7775 is where Engineering released an entirely new database schema. There was a HUGE schema migration time required.

Folks who kept restarting before it finished ended up with tons of problems.

I'm in the process of standing up a 2.6 GB DB and will let it rebuild the entire server from the DB file. If it doesn't fail there then the issue must be in the health check / container config.

Speaking of that, do you impose memory limits in your container ?

SE2Dev commented 6 months ago

Speaking of that, do you impose memory limits in your container ?

Nope (The Debian system I reproduced the issue on has 64GB of memory)

ChuckPa commented 6 months ago

Not "have", do you impose limits in the container startup? (the container's definition)

How much memory is in the machine?

ChuckPa commented 6 months ago

Just reproduced it.

Digging into Plex's Plex SQLite now.

root@dockerplex:~/Library/Application Support/Plex Media Server/Plug-in Support/Databases# ./DBRepair.sh stop auto start exit

      Plex Media Server Database Repair Utility (Docker)
                       Version v1.05.02

[2024-04-28 20.53.23] Stopping PMS.
[2024-04-28 20.53.26] Stopped PMS.

[2024-04-28 20.53.26] Automatic Check,Repair,Index started.
[2024-04-28 20.53.26] 
[2024-04-28 20.53.26] Checking the PMS databases
Killed
[2024-04-28 20.53.28] Check complete.  PMS main database is damaged.
[2024-04-28 20.53.28] Check complete.  PMS blobs database is OK.
[2024-04-28 20.53.28] 
[2024-04-28 20.53.28] Exporting current databases using timestamp: 2024-04-28_20.53.26
[2024-04-28 20.53.28] Exporting Main DB
^C

SE2Dev commented 6 months ago

Not "have", do you impose limits in the container startup? (the container's definition)

Like I said, no - there's no limits (of any type) being specified in the docker-compose.yaml.

How much memory is in the machine?

The Synology machine that I originally saw the issue on has 8GB, the Debian machine that I reproduced it on later had 64GB. Both of them were running PMS via docker using the same image.

ChuckPa commented 6 months ago

I'm using their latest image right now. I think that's where the problem is -- their current executable.

ChuckPa commented 6 months ago

Let's see if this is a workaround and constrains the problem.

DBRepair.sh:

At Line 1401, insert -

echo Sleeping for Plex workaround
sleep 10

If my testing is right, they are holding onto something in their new "One process to rule all" code which is flawed.

Ref:

   1393     $StopCommand > /dev/null 2> /dev/null
   1394     Result=$?
   1395     if [ $Result -ne 0 ]; then
   1396       Output   "Cannot send stop command to PMS, error $Result.  Please stop manually."
   1397       WriteLog "Cannot send stop command to PMS, error $Result.  Please stop manually."
   1398       return 1
   1399     fi
   1400 
   1401 echo sleeping 10 seconds
   1402 sleep 10
   1403 
   1404     Count=10
   1405     while IsRunning && [ $Count -gt 0 ]
   1406     do
   1407       sleep 3
   1408       Count=$((Count - 1))
   1409     done
   1410

SE2Dev commented 6 months ago

That does seem to serve as a workaround for the time being, although it might be worth pointing out that for whatever reason that seems to be line 1512 for me, not 1401. With that added, the extra time seems to be enough for the "Plex SQLite" instance to wrap up whatever it's doing.

ChuckPa commented 6 months ago

I might be off by a few lines .. That's why I gave you the code block to look at.

I am waiting for Plex to tell me what changed and why it's so screwy.

ChuckPa commented 6 months ago

I am putting together something a little more reasonable than that hack of a delay.

How about Manual mode? Something you can invoke OUTSIDE the container if you can specify the paths

This is first cut, cleanup is needed.

root@lizum:/home/chuck/git/chuck/PlexDBRepair# ./DBRepair.sh --sqlite /usr/lib/plexmediaserver --databases "/usb/plex/Plug-in Support/Databases" -p
==== Setting DBDIR = "/usb/plex/Plug-in Support/Databases"

      Plex Media Server Database Repair Utility ()
                       Version v1.06.00

      PlexSQLite = '/usr/lib/plexmediaserver/Plex SQLite'
      Databases  = '/usb/plex/Plug-in Support/Databases'

Select

  1 - 'stop'      - (Not available. Stop manually.)
  2 - 'automatic' - Check, Repair/Optimize, and Reindex Database in one step.
  3 - 'check'     - Perform integrity check of database.
  4 - 'vacuum'    - Remove empty space from database without optimizing.
  5 - 'repair'    - Repair/Optimize databases.
  6 - 'reindex'   - Rebuild database database indexes.
  7 - 'start'     - (Not available. Start manually)

  8 - 'import'    - Import watch history from another database independent of Plex. (risky).
  9 - 'replace'   - Replace current databases with newest usable backup copy (interactive).
 10 - 'show'      - Show logfile.
 11 - 'status'    - Report status of PMS (run-state and databases).
 12 - 'undo'      - Undo last successful command.

 21 - 'prune'     - Prune (remove) old image files (jpeg,jpg,png) from PhotoTranscoder cache.
 42 - 'ignore'    - Ignore duplicate/constraint errors.

 88 - 'update'    - Check for updates.
 99 - 'quit'      - Quit immediately.  Keep all temporary files.
      'exit'      - Exit with cleanup options.

Enter command # -or- command name (4 char min) :

ChuckPa commented 6 months ago

Update: This is verified as a PMS bug.

I've chatted with the engineer and submitted a ticket.

I'm closing this for now but will be watching my PMS ticket to be resolved.

In the interim, add as much sleep after the stop as is needed to satisfy their new "one process for all" mechanism. (The processes used to be distinct but now have been merged into one executable and I think they left a semaphore dangling)

ChuckPa / PlexDBRepair

Running `check` Command Too Soon After Stop Reports Damaged Database #144