[Bug]: Server crash on downloading episodes with big names

kirillmukhin commented 1 year ago

Describe the issue

Server crashes when attempting to download a podcast episode with a long (cyrillic) title. From what I understood Javascript is using UTF-16 encoding and, after a few tests, I am assuming that crash happens if name's size exceeds 256 bytes (I was using online calculator to determine string sizes):	RSS	Episode name	String size (bytes)
redacted	На книжном рынке России наступила полная цензура — хуже, чем в СССР. Ну хотя бы идеологически верные авторы от этого выиграют?	252	No
redacted	В России есть набирающая популярность и не запрещенная оппозиционная сила — но вам она не понравится. Это Игорь Стрелков со сторонниками, и вот почему растет их влияние	336	Yes
Калькулятор	Неужели теперь только на Ладе? Разбираемся, как сейчас купить, продать и обновить автомобиль в России 🚙	208	No
Калькулятор	«Мам, скинь пять биткоинов на карманные». Криптовалюта постепенно становится нормой. Чем она может помочь сейчас — в условиях санкций и кризиса?	288	Yes

Steps to reproduce the issue

Add any of the mentioned podcasts
Download the mentioned episode that doesn't crash the server
Verify that all went smoothly
Download the other episode
Crash

Audiobookshelf version

v2.2.8

How are you running audiobookshelf?

Docker

advplyr commented 1 year ago

Can you see what the error message is during the crash?

I just tested this with the longest one and it is not crashing for me. My guess is that the file system you are using is reaching a max file name length. Abs will crop the filename if it exceeds 240 characters but none of these do.

kirillmukhin commented 1 year ago

Log

Pod log from TrueNAS Scale ```2022-12-11 11:31:20.516714+00:00[2022-12-11 14:31:20] DEBUG: [DB] Updated user: 1 2022-12-11 11:31:20.730088+00:00[2022-12-11 14:31:20] DEBUG: [podcastUtils] getPodcastFeed for "https://meduza.io/rss2/podcasts/meduza-v-kurse" 2022-12-11 11:31:22.218885+00:00[2022-12-11 14:31:22] DEBUG: [podcastUtils] getPodcastFeed for "https://meduza.io/rss2/podcasts/meduza-v-kurse" success - parsing xml 2022-12-11 11:31:22.923635+00:00[2022-12-11 14:31:22] DEBUG: [podcastUtils] getPodcastFeed for "https://meduza.io/rss2/podcasts/meduza-v-kurse" success - parsing xml 2022-12-11 11:31:31.255763+00:00[2022-12-11 14:31:31] DEBUG: [Watcher] Ignoring directory "/library/podcasts/Что случилось" 2022-12-11 11:31:31.258356+00:00[2022-12-11 14:31:31] DEBUG: [fileUtils] Downloading file to /library/podcasts/Что случилось/В России есть набирающая популярность и не запрещенная оппозиционная сила — но вам она не понравится. Это Игорь Стрелков со сторонниками, и вот почему растет их влияние.mp3 2022-12-11 11:31:31.267194+00:00/server/libs/njodb/index.js:103 2022-12-11 11:31:31.267268+00:00throw error; 2022-12-11 11:31:31.267287+00:00^ 2022-12-11 11:31:31.267302+00:002022-12-11T11:31:31.267302263Z 2022-12-11 11:31:31.267316+00:00Error: ENAMETOOLONG: name too long, open '/library/podcasts/Что случилось/В России есть набирающая популярность и не запрещенная оппозиционная сила — но вам она не понравится. Это Игорь Стрелков со сторонниками, и вот почему растет их влияние.mp3' 2022-12-11 11:31:31.267373+00:00Emitted 'error' event on WriteStream instance at: 2022-12-11 11:31:31.267389+00:00at emitErrorNT (node:internal/streams/destroy:157:8) 2022-12-11 11:31:31.267409+00:00at emitErrorCloseNT (node:internal/streams/destroy:122:3) 2022-12-11 11:31:31.267432+00:00at processTicksAndRejections (node:internal/process/task_queues:83:21) { 2022-12-11 11:31:31.267445+00:00errno: -36, 2022-12-11 11:31:31.267458+00:00code: 'ENAMETOOLONG', 2022-12-11 11:31:31.267471+00:00syscall: 'open', 2022-12-11 11:31:31.267491+00:00path: '/library/podcasts/Что случилось/В России есть набирающая популярность и не запрещенная оппозиционная сила — но вам она не понравится. Это Игорь Стрелков со сторонниками, и вот почему растет их влияние.mp3' 2022-12-11 11:31:31.267508+00:00} 2022-12-11 11:31:31.689901+00:00npm notice 2022-12-11 11:31:31.690037+00:00npm notice New major version of npm available! 8.19.2 -> 9.2.0 2022-12-11 11:31:31.690139+00:00npm notice Changelog: 2022-12-11 11:31:31.690270+00:00npm notice Run `npm install -g npm@9.2.0` to update! 2022-12-11 11:31:31.690292+00:00npm notice ```

I got my brain working a bit better and tried to download episodes and place them manually. That did not work. Trying to get latest episode with youtube-dl gives ERROR: unable to open for writing: [Errno 36] File name too long.

Error happens both on TrueNAS Scale server with zfs file system, where auidobookshelf is running, and on local Linux distribution with ext4 filesystem.

KDE Kasts was able to get the episode as it renames them to seemingly random strings of latin characters and numbers. The file I got was named e5996abf273afa838bf5cffc1cba4b28, so I tried to rename the episode to the full original name gave following:

mv e5996abf273afa838bf5cffc1cba4b28 'В России есть набирающая популярность и не запрещенная оппозиционная сила — но вам она не понравится. Это Игорь Стрелков со сторонниками, и вот почему растет их влияние.mp3'
mv: cannot stat 'В России есть набирающая популярность и не запрещенная оппозиционная сила — но вам она не понравится. Это Игорь Стрелков со сторонниками, и вот почему растет их влияние.mp3': File name too long

Then I truncated the cyrillic string until the error disappears, which happened here:

mv e5996abf273afa838bf5cffc1cba4b28 'В России есть набирающая популярность и не запрещенная оппозиционная сила — но вам она не понравится. Это Игорь Стрелков со сторонни.mp3'

Original string (with '.mp3' extension) has 172 individual characters (graphemes) and takes 322 bytes in UTF-8 encoding. Truncated string (with '.mp3' extension) has 136 individual characters and takes 255 bytes in UTF-8.

As far as I understand, non-latin characters take more space, therefore they may cause this problem before reaching 240 characters count. From Wikipedia:

The first 128 code points (ASCII) need one byte. The next 1,920 code points need two bytes to encode, which covers the remainder of almost all Latin-script alphabets, and also IPA extensions, Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac, Thaana and N'Ko alphabets, as well as Combining Diacritical Marks. Three bytes are needed for the rest of the Basic Multilingual Plane, which contains virtually all code points in common use,[16] including most Chinese, Japanese and Korean characters. Four bytes are needed for code points in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).

A "character" can take more than 4 bytes because it is made of more than one code point.

So if people will start making podcasts episodes with titles made exclusively out of emojis - big trouble.

advplyr commented 1 year ago

I updated the script to keep the filename under 255 bytes. I think this should cover it for a while even though Windows file system does allow for longer.

advplyr commented 1 year ago

Fixed in v2.2.9

advplyr / audiobookshelf