decker-edu / decker

A markdown based tool for slide deck creation.
GNU General Public License v3.0
60 stars 15 forks source link

shake database is corrupting itself if several thousands of files are tracked #104

Open salbeira opened 8 months ago

salbeira commented 8 months ago

.decker/.shake.database is getting quite large and larger on every reload (200 MB in one case where a user had ~ 15.000 files of external libraries he had to duplicate a lot (H5P embedded elements) - it seems like shake is corrupting its own database by writing bytes in ascending values to the .database - the write to the database also triggers a reload)

Originally posted by @salbeira in https://github.com/decker-edu/decker/issues/102#issuecomment-1923806824

A little more detail:

In one of the decker-project directories of one of our project team members there are several H5P directories. One H5P directory for every video they once recorded and embedded in our Moodle.

The H5P Standalone library requires you to take the embedded H5P file, unzip it and make it available to the H5P standalone script to embedd the H5P element in your website. Each of these unzipped H5P files contains a vast part of the H5P library but we could not find a way to deduplicate these contents as the instructions of the H5P website clearly state that you should just reference the extracted files for each and every element on their own.

The extracted H5P files are in the form:

content <- actual content of the H5P element
[LIBRARY FOLDERS] <- several thousands of js and css files which are mostly duplicates but also SOME are unique to each H5P element
h5p.json <- config file

Thus we have about 15.000 files of H5P javascript and CSS code for the 9 videos in the project.

As soon as you trigger a "reload" operation by changing any file in this scuffed project directory, shake writes something into the .shake.database file. Opening that file raw shows you that it has a lot of binary data, some readable strings and then a ~200 line long tail of

[CODE] 0
[CODE] 1
[CODE] 2
[...]
[CODE] H
[CODE] I
[CODE] J
[...]
[CODE] %
[CODE] &

which gets appended over and over and over again to the .database file.

Removing large parts of the duplicate H5P directories removes this issue entirely.

The reload of the filechange triggers a write to the .decker/.shake.database file which then triggers another reload. After a while of being confused the .decker/.shake.database file grows to enourmous proportions if you do not kill the server as every reload appends about 100kB of junk to the database.

monofon commented 8 months ago

45ee39ad2fc4fb5a391dc27c703c859de08e3ee9 prevents anything in .decker from being watched.

15.000 files for one video sounds insane. How many of those are actually loaded by the browser?

Easy solution: do not use H5P.

salbeira commented 8 months ago

I added this:

  case dod of
    FileChanged time path -> do
      unless (NoRebuildFlag `elem` flags) $
        catchAll
          (runShake context rules)
          (\(SomeException _) -> return ())
      putStrLn $ "# Server: Reload because of: " <> show path
      reloadClients (context ^. server)

putStrLn to the reload trigger and it said:

Server: Reload because of: "Modified "C:\\[...]\\2024_01_02_SLE_Vektorgrafiken\\.decker\\.shake.database" 2024-02-02 15:09:40.9239367 UTC False"

It is not 15.000 for one video, it is ~1500 files for one H5P element and 10 of these elements. It seems that every H5P element you download has every bit of javascript code and css it needs for presentation embedded in its file.

monofon commented 8 months ago

Even after 45ee39ad2fc4fb5a391dc27c703c859de08e3ee9?

salbeira commented 8 months ago

The version they used was the current latest and the debug output I gave them was based on current develop ...

Oh that is a commit you JUST added. No that commit was not included in the version they used. We just ended our meeting about an hour ago.

salbeira commented 8 months ago

Even after 45ee39a?

Hmh the latest release did not build Windows and Mac files?

monofon commented 8 months ago

Patience ...

salbeira commented 8 months ago

It was just reported to me that the fix did not change the issue. The database still grows and grows and also triggers reloads.

This is not an issue for the produced and published slides as they do not have a reloader to connect to but during development this was an issue. They asked me to show you what kind of slides they had the issue with and how they used the H5P standalone library so you have an idea why they have this huge chunk of library data in their project:

https://decker.cs.tu-dortmund.de/decks/hylec/vektorgrafiken/folien/1_vektorgrafiken-deck.html#/section

monofon commented 6 months ago

This should definitely be fixed by 0203f2c170eca67d2cae55b6a6e8221520d91920. The database will still grow, but it cannot trigger reloads anymore.

monofon commented 6 months ago

@salbeira can you confirm?