clara-j / media_cleaner

Python script to delete watched content on Emby
31 stars 17 forks source link

episode is deleted while favorite #33

Closed keppo070 closed 2 years ago

keppo070 commented 2 years ago

It looks like when UserA has seen a media file (episode or movie) and UserB has it marked as Favorite but has not seen the media it is still being deleted by media_cleaner.

Episode S01E01 UserA = seen UserB = unseen, favorite

When I mark UserB has seen the Episode only then 'sees' the script that the episode is a favorite.

What I expect to see is that the Favorite status is first evaluated before the seen status IF another user has seen it. (hope this make sense)

I have tried with _keep_favoritesepisode=1 and _keep_favoritesepisode=2 It does not seem the make a difference. It is the same for Movies

@terrelsa13

terrelsa13 commented 2 years ago

@keppo070

Hmmm... Your are correct. This is a side effect of only grabbing the "watched/played" media from each library in an attempt to stay light on resources. I see by doing it this way I have made it behave in a way that is not as expected.

The only way for the script to know if a user who has not yet watched an episode has set it as a favorite would be to request the metadata for every item in the library and check if it is a favorite.

This can be done; but at a cost. For people with MASSIVE libraries or a lot of users, the script will take much longer to processes every media item and/or every user.

I guess I can make this a configuration option. I will try to look into this over the weekend and let you know what I come with as a solution.

terrelsa13 commented 2 years ago

@keppo070

Try this branch and let me know if it fixes the issue.

If it works I will merge these updates into the master.

keppo070 commented 2 years ago

This branch fixes my issue! It also works when I favourite a complete serie as for an episode.

Like you said it comes with a performance price. For me personally, it is not an big issue. I have a small library of approximately 600 media items and 3 users

media_cleaner.py (request_not_played=1) media_cleaner.py
real 0m16.530s real 0m1.228s
user 0m3.169s user 0m0.378s
sys 0m0.855s sys 0m0.047s

Maybe it is possible to only check for a favourite status in the delete phase of the script?

terrelsa13 commented 2 years ago

Cool! Always good to see data. Glad it works. The master has been updated.

My library has about x1000 media items and I monitor x3 different users. I saw an increased from ~2.5s to ~4mins.

@keppo070 Do you mind updating your last post with the number of items you have in your library and how many users you are monitoring so anyone else reading this has two data points they can use to roughly estimate how long it may take their script to run?

Maybe it is possible to only check for a favorite status in the delete phase of the script?

Unfortunately the script does not know the favorite status until it queries Emby/Jellyfin for it. With the default script configuration the script asks for media in batches of x100 items at a time. It then goes thru those x100 media items individually to determine favorite, library paths, associated genres, etc... Once it is done it requests the next x100 items and repeats the process until no media items are left.

At 16sec-17sec runtime not really a big deal. But you may be able to increase/decrease api_return_limit and see a change in performance.

#----------------------------------------------------------#
# API return limit; large libraries sometimes cannot return all of the media metadata items in a single API call
#  This is especially true when using the max_age_xyz or return_not_played options; both require every item of the specified media type send its metadata
#  1-10000 - number of media metadata items the server will return for each API call for media item metadata; ALL queried items will be processed regardless of this value
#  (100 : default)
#----------------------------------------------------------#
api_return_limit=100

For anyone reading this in the future: With Emby/Jefllyfin running on semi-recent hardware and a wired network connection do not expect the api_return_limit configuration option to provide a significant performance increase. If Emby/Jellyfin run on a HDD (instead of a SSD) or is connected over WiFi (instead of wired) the api_return_limit configuration option may provide a performance increase. But I am not able to confirm this with my setup. What is important is the script is run during a time when the server has the lightest load. For me that time is every Monday @0200hrs. You will have to determine your servers light load time window.