CastagnaIT / plugin.video.netflix

InputStream based Netflix plugin for Kodi
MIT License
1.91k stars 259 forks source link

Service fails to start if Mysql is enabled (IPC timeout) #1401

Closed citronalco closed 2 years ago

citronalco commented 2 years ago

Netflix add-on version

1.18.9+matrix.1

Operative systems used

OSMC

Kodi version used

Kodi 19 (Matrix)

Description of the bug

If restart Kodi with Mysql enabled in the Netflix plugin, the service does not start (no "Netflix service started" notification), leading to an IPC timeout error when trying to watch some content from Netflix..

As soon as I disable Mysql, the service starts and everything works properly.

As soon as I enable Mysql, Kodi shows a message that the database connection is ok, and I actually see that some data gets written into the tables "profiles" and "shared_app_config". But picking any content from Netflix brings up the IPC timeout. Until I disable Mysql again.

Already tried with IPC-over-HTTP, also timeout. Recreated the database using db_create_mysql.py as reference, no change.

I enabled logging as described in the wiki, but did not get any information about the service - only the IPC timeout.

Steps to reproduce the behavior

No response

Debug log - mandatory

https://paste.kodi.tv/bifiqatupe.kodi

Possible fix

No response

Additional context

No response

Screenshots

No response

CastagnaIT commented 2 years ago

is needed a full Kodi debug log with addon debug enabled, as explained in github readme

citronalco commented 2 years ago

I updated the pastebin link above, now it contains a (nearly) full debug log - I left out curl, otherwise size would be too big for pastebin.

I've uploaded the complete (with curl) logfile here: https://www.geierb.de/~geierb/kodi.log

CastagnaIT commented 2 years ago

I left out curl, otherwise size would be too big for pastebin.

in this case you can turn off "Enable component specific logging" to prevent generate big logs

anyway the log is good but i dont see nothing that suggest a kind of problem to freeze the addon like this could mean that not receive the response from the mysql server perhaps is busy with other addons? im am not so expert with server configurations

citronalco commented 2 years ago

Thanks for looking into it!

I tried to figure out what's going on on myself for some hours (I do know some Python, but have no idea about Kodi's internals). The issue has 100% reproducibility, regardless how long Kodi is running, and the mysql server runs on a real server. So I doubt it is because a busy database.

Is there something I can check? E.g. if something is listing on some ports, or put some prints (or whatever is required to get something written to the log) in some (which?) files to see at which point things start falling apart?

CastagnaIT commented 2 years ago

from your log i can see that you have tried to open "My list" menu:

INFO : [plugin.video.netflix (6)] URL is ParseResult(scheme='plugin', netloc='plugin.video.netflix', path='/directory/video_list_sorted/myList/queue/', params='', query='', fragment='') DEBUG : [plugin.video.netflix (0)] Requesting video list sorted for context name: "mylist", context id: "None"

after that there are two http requests like this:

DEBUG : [plugin.video.netflix (0)] Executing path request: [['mylist', 'az', {'from': 0, 'to': 44}, .......

this because your list have many titles, after that there is another http request, to get the same list (but for other purposes)

DEBUG : [plugin.video.netflix (0)] Requesting "my list" video list as videoid items DEBUG : [plugin.video.netflix (0)] Requesting the full video list for mylist DEBUG : [plugin.video.netflix (0)] Executing path request: [['mylist', 'az', ............

after that nothing else, this last http request come from self.req_mylist_items() of this line: https://github.com/CastagnaIT/plugin.video.netflix/blob/v1.18.9/resources/lib/services/nfsession/directorybuilder/dir_builder.py#L102-L103

when this method (get_video_list_sorted) exit (so return the data to the caller), this data is transferred from "addon service" instance to the addon instance (too long to explain) however seems that the code run dead here, from what i remember (few years ago) though at this point there should not be mysql queries

from linked line you can try add some LOG output around here, you can reuse LOG "import" example:

LOG.debug('Print this {}', my_variable_data) LOG.debug('Output Test')

see examples in other addon files

citronalco commented 2 years ago

Thank you for the debugging hints!

Found something: It's busy creating video items...or better: creating video items is very slow when using the MySQL feature.

get_video_list_sorted calls build_video_listing, which calls _create_video_item for each video, so very often. And _create_video_item does some MySQL operations.

_create_video_item is 10 times slower when using a shared MySQL database (times are with full debug on):

With MySQL on:

DEBUG <general>: [plugin.video.netflix (6)] Execution time measured for this run:
                                                   video_list_sorted             45342 ms
                                                     make_call                     45328 ms

With MySQL off:

DEBUG <general>: [plugin.video.netflix (6)] Execution time measured for this run:
                                                   video_list_sorted              4103 ms
                                                     make_call                      4090 ms

So while increasing the timeout (lib/AddonSignals.py 111) helps to get it running, that's too slow for real life. The database server is not a Raspi1 but a real server with several good performing databases on it.

Looking at MariaDB's log I see that for each video item the Netflix plugin creates a new database connection. Though my Kodi hardware, a Vero4K+, is not the slowest, this seems to bring it down.

Update: Just tried on Debian to sort out potential OSMC weirdness: It's the same. set_watched_status and get_watched_status in resources/lib/database/db_shared.py open a new database connection via handle_connection from db_base_mysql.py, for each item, every time. This also happens when not using Mysql but sqlite. Most likely other functions do the same, at least I haven't seen a single call of handle_connection that did not end in creating a new connection by either mysql.connector.connect or sql.connect

Though creating new connection with sqlite is much less expensive than with a remote Mysql server, using a persistent connection should also lead to a noticeable speedup.

CastagnaIT commented 2 years ago

I am a little surprised because in recent years no one has complained about slow sql access is also possible that you have a good speed but an high latency, your sql connection is over wifi or wired?

when build_video_listing is executed the access to the shared database (==sql) should be from this line of code:

https://github.com/CastagnaIT/plugin.video.netflix/blob/master/resources/lib/kodi/infolabels.py#L290

if i dont forget something its the only query done, so you can try replace that line with: is_watched_user_overrided = None

let me know if there are speed improvements

citronalco commented 2 years ago

I'm using a wired connection.

For easier testing I switched now from the Vero4K+ to my Debian PC. So don't be surprised by the now relatively low numbers, compared to those above, but the differences are still clearly visible. I'm always opening "My List" in Netflix:

Unmodified code, MySQL enabled:

DEBUG <general>: [plugin.video.netflix (1)] Execution time measured for this run:
video_list_sorted              7656 ms
make_call                      7655 ms

New sqlite connections created: ~15 New MySQL connections created: ~92

Unmodified code, MySQL disabled:

DEBUG <general>: [plugin.video.netflix (1)] Execution time measured for this run:
video_list_sorted              2687 ms
make_call                      2685 ms

New sqlite connections created: ~118 New MySQL connections created: 0

Modified code (is_watched_user_overrided = None), MySQL enabled:

DEBUG <general>: [plugin.video.netflix (1)] Execution time measured for this run:
video_list_sorted              6177 ms
make_call                      6175 ms

New sqlite connections created: ~18 New MySQL connections created: ~92

Modified code (is_watched_user_overrided = None), MySQL disabled:

DEBUG <general>: [plugin.video.netflix (1)] Execution time measured for this run:
video_list_sorted               295 ms
make_call                       292 ms

New sqlite connections created: ~120 New MySQL connections created: 0

So is_watched_user_override takes 1-2 seconds, but does not seem to affect the number of database openings.

For counting new connections I added a LOG.debug("New MySQL connection") in the "if" clause, and I added a "else" with LOG.debug("Use existing MySQL connection") in https://github.com/CastagnaIT/plugin.video.netflix/blob/master/resources/lib/database/db_base_mysql.py#L34 Similar for sqlite: https://github.com/CastagnaIT/plugin.video.netflix/blob/master/resources/lib/database/db_base_sqlite.py#L43 I counted the log entries manually, so they are probably not perfectly accurate.

I have not seen a single "Use existing MySQL/sqlite connection" in Kodi's log. I think this is the most important finding. If my methodology is correct.

citronalco commented 2 years ago

Connections get closed in the "finally" blocks in https://github.com/CastagnaIT/plugin.video.netflix/blob/master/resources/lib/database/db_base_mysql.py#L44 and https://github.com/CastagnaIT/plugin.video.netflix/blob/master/resources/lib/database/db_base_sqlite.py#L45. I removed the finally blocks (and the locking of sqlite, as sqlite is thread safe by default), and: Things still work (at least for now), both sqlite and MySQL connections get reused, and that's the speedup (unmodified code):

MySQL enabled:

DEBUG <general>: [plugin.video.netflix (1)] Execution time measured for this run:
video_list_sorted               522 ms
make_call                       520 ms

-> 14 times faster

MySQL disabled:

DEBUG <general>: [plugin.video.netflix (1)] Execution time measured for this run:
video_list_sorted               279 ms
make_call                       278 ms

-> 9 times faster

I don't know if the "finally" blocks and the sqlite locking are intentional, a required workarounds for something, or if it simply slipped in. No clue. But they kill performance.

CastagnaIT commented 2 years ago

its intentional not a workaround, remove close connection leave db connection persistently opened this is not safe thing

especially for sqlite because use lock system, with this addon we can have multiple threads and multiple addon instances that make accesses at same time, this can lead to different problems more easily identifiable with a slow storage, i think in my old addon PRs i have explained something

usually at least for sqlite is good pratice close the connection just after the work is done so: open->do operations->close and also any pending transactions will be automatically completed at close call

so imo i think the best thing is find a way to allow to handle connection at higher level method build_video_listing

i also found that others sql accesses are done by generate_context_menu_items sub methods i was forgotten

citronalco commented 2 years ago

Digging through the PRs: Here's the first one I've found, adding the databases: https://github.com/CastagnaIT/plugin.video.netflix/pull/140 Closing sqlite immediately is already in. I only found one subsequent matching PR that seem to take care of threading issues like trying to close an already closed database (https://github.com/CastagnaIT/plugin.video.netflix/pull/426) or using a database connection that got closed in the meantime (that's the one with the slow devices you mentioned).

According to the docs sqlite is perfectly thread-safe (https://www.sqlite.org/threadsafe.html) by default, and supports multiple simultaneous accesses (https://sqlite.org/faq.html#q5). Some years ago the situation was different, and it was suggested to close the DB connection immediately - but I don't believe anyone's running Kodi with a sqlite3 version from 2016. Locking/unlocking is done by commits, Python retries automatically until the table is unlocked, and auto-commit is already on, so commits don't take long. So I'd say leaving out the close (and the mutex) would be safe - on slow devices, too.

Similar with the Mysql connection: Auto-commit is on, it's thread safe, so I see no point in closing the connection.

I'd say I give it a shot and try.

Your Netflix addon is heavily used here. Thank you!

CastagnaIT commented 2 years ago

kodi embed his sqlite version that may change between kodi clients (LibreElec etc...) and also by kodi build for differents operative systems usually however are fairly recent library versions

but these changes has been introduced not long time ago https://github.com/CastagnaIT/plugin.video.netflix/pull/413 in the meantime, the sqlite versions have not changed much in kodi builds

despite what the documentation says, from the problems users ecountered in the past sqlite threading does not work well at least not in all kodi builds

so the removal of "close" and relative mutex, can be taken in consideration only with extensive testing on all Kodi platforms/builds

for the mysql case, maria db support threading but what about MySQL (community)?

citronalco commented 2 years ago

Mysql is thread safe.

To me it looks like #413 fixes race conditions that can occur if one thread closes the DB connection, and another thread tries to use it while the close() call is still writing to disk (slow devices!), so that the connection already does not work, but "is_connected" is still True. It's also possible that the used libsqlite3 was intentionally compiled without multithreading support, maybe to make it faster/smaller, and killed the database. But I can be wrong.

I created PR #1408 with some code. Let's continue discussion there. The code is only tested for a single day and only on Debian and OSMC (which is basically the same). I will continue testing and update the PR accordingly (either code or comments).

CastagnaIT commented 2 years ago

atm i had not time to verify proposed PR i am currently busy with other work maybe weekend or next week

To me it looks like https://github.com/CastagnaIT/plugin.video.netflix/pull/413 fixes race conditions that can occur if one thread closes the DB connection, and another thread tries to use it while the close() call is still writing to disk (slow devices!),

could be but need to test it also under Windows and android that have different sqlite versions i think one way is make library sync runing in background (check kodi log in realtime) then export manually more new tvshows at same time, in this way two threads should access the db at same time

just a consideration for now, the use of global is a bad pratice there should be a way to move it in to a variable to be set when test db inititalization