clara-j / media_cleaner

Python script to delete watched content on Emby
31 stars 17 forks source link

[little help] modifying your script to simply move all watched files #13

Closed zilexa closed 3 years ago

zilexa commented 3 years ago

I use MergerFS to pool 2 HDDs with an SSD, I specifically use the tiered caching solution, having 2 pools: one that includes a folder on my system SSD, plus the 2 HDDs. The second pool only contains the (same) HDDs. All data goes to the first pool. A script runs nightly to move files that have not been accessed in the past 30 days to the second pool. Note this pool is a subset of the first pool and the path to this 2nd pool is not used in any way, it only exists to be able to move files away from the SSD. Files moved to the 2nd pool are only moved within the array (SSD >> HDDs), they can still be found in the same path location on the first pool when you look at a file manager, because that first pool simply shows all data on HDDs and SSD, regardless on which of those disks it is stored.

Now I would like to run a nightly script that will move watched files (regardless when it has been watched) to the second pool. This will free up even more space. If I have 4GB episode files (2160p), watch 10 episodes in a week, I could save 40GB on my SSD. The Media_Cleaner script erases these files after 15 days (to give other fam members time to watch it). But before cleaning them, I can safely move them off my SSD already.

The command to move files based on last accessed is simple:

find "${CACHE}" -type f -atime +${N} -printf '%P\n' | nocache rsync --files-from=- -axHAXWES --progress --preallocate --remove-source-files "${CACHE}/" "${BACKING}/" This moves files from one pool to the other (the 3 variables are given as input when I run the script).

I now like to modify this command to find all (watched) files. But looking at your script (I am no Python expert) it seems I would need quite a bit of code to get the data from Jellyfin, build a list and then perform the rsync action on that list. So now I wonder if it is better to start with your script and remove everything :) I don't need # of days, favourite status, don't care about whether it is a show, season, movie or episode as the folder structure won't change. So I can just focus on episodes, movies and their watched status (boolean). First thing I notice: you are not asking the mediaserver for the boolean "watched status", instead you obtain date_last_played. Is this correct?

terrelsa13 commented 3 years ago

Hey @zilexa, This seems like this might be doable. Unfortunately the power supply to my server fried yesterday and I am not able to verify if the file path is also a part of the data we get for each movie, episode, etc... You could take a look by setting "DEBUG=1" and "remove_files=0" in the config file. It should create a file called media_cleaner.debug in the same folder as the config file.

In the media_cleaner.debug file it will show the movie/episode data emby (or jelly) sends us to make this script work. We only use what we need. But if you see in the debug file every movie/episode has a full path name listed. This could be used to move the movies/episodes instead of deleting them.

I do not remember seeing any API calls for emby (or jelly) to move the files. But you can run linux commands in a python script. Which would allow the delete API call in the delete_item() function to be replaced with an rsync command. Or as you mentioned output a list with full paths and have a secondary bash script run rsync with an include-from="the_output_list".

zilexa commented 3 years ago

Hi, sorry I have not had time anymore to try this.

I thought all I would have to do, is simply run a copy of the script with the config set to 0 days (=which hopefully means all watched files regardless of how long ago it was watched) for all videos/tv/movie files, then replace the Python deletion command for a mv command. But I couldn't locate the actual deletion command in the current script, probably because it is a Python-library command, not an rm system command.

clara-j commented 3 years ago

The script doesn't actually delete the files itself, it does an API call to Emby to perform the deletion.

The area you would want to look at is the function: def delete_item(itemID) This is where the API is created and called.

terrelsa13 commented 3 years ago

@zilexa Looks like this might actually be do-able using emby/jellyfin API calls. There is an API called "/Library/VirtualFolders/Paths/Update" which allows a media files path to be changed. What I do not know is if this actually moves the media file or if it just changes the path saved in the media files metadata. It might also require rebuilding the ../series/season/media.file folder structure at the new location. And lastly, the way the virtual-folders work, it would require setting up two folder paths on your server and then adding both paths to the same emby/jellyfin library - Library > Select TV Library > +TVFolder1 and +TV_Folder2. TV_Folder1 would be the SSDs and TV_Folder2 would be the HDDs.

With that said, I do not see a clean way to implement this in the current script without making user input convoluted. This is likely a separate project.

terrelsa13 commented 3 years ago

@zilexa Looked into this API call a little more. It does not do what I thought it did. It is used to update library paths; not paths of individual media items.

Being that os is already imported into this script... os.rename() can be used to move files. (will not move files between different drives) os.replace() can also be used to move files. (will move files between different drives)

This is the os.replace() syntax: os.replace("path/to/current/file.foo", "path/to/new/destination/for/file.foo") os.rename() uses the same syntax.

zilexa commented 3 years ago

Sorry I was spending time in other rabbit holes concerning my server.

My question was all wrong actually. What is really the case: I use MergerFS, it pools (to be exact: it creates a union of all folders) an SSD and 2 harddisks together. I use a tiered caching setup, as long as the SSD has enough free space, data will go there. Nightly, any data older than 30 days is moved from the SSD to a secondary MergerFS pool that only contains the same 2 harddisks, not the SSD.

So Jellyfin just sees (mounted volumes in its docker container):

/mnt/pool/TV/Series
/mnt/pool/TV/Movies

Where /mnt/pool is a union of:

/mnt/disks/ssd
/mnt/disks/data1
/mnt/disks/data2

Nightly, I would like to move watched items to: /mnt/disks/ssd/TV/Series

After moving, the files are still part of the pool, their path on the pool is unchanged, just the actual disk they are stored on has changed.

But this is not a known path for Jellyfin, so the actions cannot be performed via the API, the necessary information can be obtained.

To do this, I believe I should 1) generate a list of watched items containing their path then 2) strip/replace the path and 3) just perform bash actions to move the files.

For step 1, I believe I can use your script to communicate with the API. But I haven't looked in to this much further yet.

zilexa commented 3 years ago

I no longer use a disk pool or ssd cache.