Using beets for Video Library Organization?

addisonamiri commented 8 years ago

I was wondering if there was any interest in adding video library support to beets. I really like the workflow of importing my media and playing it with beet play without the overhead of a full blown media player. I was curious what would be necessary for beets to implement video support. So far what I think would be required at a minimum is this:

id3tag support for video formats
new data sources for TV shows and movies
a config option for a video directory

I don't really think this is in scope for the beets project and a lot of it won't carry over into video organizing but I can't seem to find a media organizer for videos that doesn't require a server or a gui (Plex, Kodi, Emby, etc) and most of those projects require manual editing of filenames in order for the lookup to succeed.

I was wondering what everyone thought about this functionality being in beets or another project similar to beets. I know I'd find it useful but I'm not too sure if it's worthy of being incorporated into the main project.

Desired Workflow

Ideally this would be the workflow I'm looking to achieve:

Run beet import The\ Princess\ Bride.mp4.
- Maybe a --as-video flag would be needed for an alternate import method.
- Automatic movie detection could be done by looking at the length of the file maybe.
Beets will search IMDB (or another metadata source) based on the name.
Beets will identify the movie as Princess and the Bride.
Beets will move/copy the file to the video directory specified in the config.
Beets will rename the file, add all metadata desired, and finish the import.

Then after this is complete beet play Princess Bride would start playing that file with the configured media player.

ab623 commented 8 years ago

Filebot + ACM with some config is what you need

https://www.filebot.net/forums/viewtopic.php?t=215

sampsyo commented 8 years ago

Thanks for starting the discussion! I've always thought expanding to cover other media types would be an interesting direction.

One major roadblock is that many video file types, last I checked, don't have standard metadata fields. And in any case, it's not common to find tagged video files even with non-standard formats, so we'd need to rely much more on filename heuristics. See also #1160.

I can see this going one of two ways:

Adding "media kinds" to beets, so songs and videos coexist in the Items table with a flag to distinguish them.
Start a sister project that uses much of beets' internals but focuses exclusively on videos. Much of the machinery is already reusable—namely, the dbcore module contains the "soul" of beets without knowing anything about music.

jackwilsdon commented 8 years ago

Your second point is interesting, @sampsyo. Could we perhaps move dbcore into it's own repository and pip package?

sampsyo commented 8 years ago

Yep, that was always the long-term goal with the dbcore refactor. If there's sufficient momentum behind a "beets derivative" for video, that might be the push we need to finish that up.

lhupitr commented 8 years ago

Filebot has everything you are looking for and more - it integrates perfectly for me between flexget and kodi.

1xPdd commented 8 years ago

I can't speak to video, but I just wanted to put in a vote for a beets ebook organizer. There really is not a KISS tool for ebooks. Calibre is, as far as I know, the only Linux software that scrapes and organizes ebooks; however, it's a train wreck, modifying files without warning the user, forcing exactly one organizational scheme, etc. Nothing like the elegance of beets!

cobra2 commented 8 years ago

I would be way more interested in being able to use beets to rename/organize video files and generate nfo files for kodi. Solutions like filebot are great, but anything that requires java to run and frequently breaks on updates does not interest me in the least.

I'd try to intergrate https://github.com/guessit-io/guessit into beets. It does a fantastic job of guessing information based on filename and the output would probably make a good starting search on tmdb or tvdb. I see the workflow better suited to a plugin (but i'm not a dev).

Man.... with the inline plugin.... this would be so awesome.

Kraymer commented 8 years ago

Until one day beets manage video files, I wrote flinck whose goal is to create symlinks to your movies files and organize them by year/genre/whatever ... beet users should not feel disoriented, as it reuses some beets concepts : heavy use of confit to configure the tree hierarchy, buckets to reuse user-created folders, google backend to guess movie original name from another country release name.

But it's not the swiss-knife for video files as beet is for audio : no database, no renaming, just symlinking really. I released a new version yesterday and would appreciate some feedbacks.

Freso commented 8 years ago

@sampsyo I think I'd prefer if video (and eBook?) management was kept in beets rather than split off into separate (even if related) projects. I already do use beets for some (music) video management, so it's not impossible to envision to me. Of course, this might make some feel beets become bloated, so maybe it'd be possible to "modularise" beet to support various media types depending on what's installed? (E.g., I'd imagine I'd like to organise eBooks and audiobooks together, and personal videos, movies/tv series, random YouTube downloads, and music videos together.)

1xPdd commented 8 years ago

@Freso I concur that opting for modularity would be the way to go. The ability to scrape new media types seems like it belongs as a plugin, prodding the community to create plugins they desire. Sadly, I realize, eBooks will probably not be the community's highest priority... Not sure if others are aware, but MediaElch works quite well for video. Perhaps some of the scrapping work can be borrowed from that.

sampsyo commented 8 years ago

Well, the upshot is that modularity is a good idea for lots of reasons! Even boring ones like maintainability that have nothing to do with video or ebooks. So I'm all for it, especially if it helps us build tools that feel engineered for different use cases.

darkfeline commented 7 years ago

Somewhat related, but MusicBrainz actually includes video tracks (because some albums include bonus promotional videos and the like). As an initial step toward potential video library support, perhaps MusicBrainz video track support could be added to beets?

https://github.com/beetbox/beets/issues/1210

devhell commented 6 years ago

Oh my, I'd love it if beets could also take care of TV shows and/or movies. Sickbeard, SickGear, SickRage, MediaElch, FileBot etc. are all way too heavy and complex, especially if you just want to point the program to a directory with a tv series and have it rename all episodes appropriately.

khimaros commented 6 years ago

I was thinking of Beets as I experimented with https://github.com/perkeep/perkeep -- the intended use cases and workflows are quite different, but having a central store with a flexible metadata system is something both of these systems share.

Perkeep could serve as an example of how to provide a number of modular "importers" which produce metadata in a single database. @sampsyo -- how modular is the import flow currently, and how hard would it be to extend to arbitrary file types?

sampsyo commented 6 years ago

You're right; there is a certain similarity in philosophies there! I'd be interested to explore this more deeply.

To answer your direct question, the importer pipeline is reasonably reusable, although there is a fair amount of music-specific logic mixed in there: mostly surrounding albums that group together individual tracks.

One thing that is very abstract, however, is the database layer. Take a look at our dbcore package, which does everything having to do with items, their fields, and queries over them. That actually seems like a good point to overlap with Perkeep.

khimaros commented 6 years ago

@sampsyo -- would it be helpful to track this effort as a separate bug? Something like "Modularize the importer and support file types without inline metadata"? Or do you feel this is outside of the scope of what should be supported by the Beets project?

I did take a look at dbcore! If one wanted to create a separate tool for importing, setting metadata, and querying over arbitrary local files, it seems like this would be a great place to start. Do you have strong feelings on whether that is the best route?

sampsyo commented 6 years ago

Sure; a separate thread sounds good! I guess the way I’d put the project is: let’s make the importer module generic and reusable in the same way that dbcore is. The idea would be to factor out the common logic from the music-specific stuff—without breaking beets too much in the process. :smiley:

With hard work in place, I can imagine it going one of two ways: either resume the same components (dbcore + this new importer module) to make a beets-like tool for video, or just extend beets for other media types in place. I have a less strong feeling about which of those is a better idea, but both seem worth exploring.

khimaros commented 6 years ago

@sampsyo, it looks like the majority of the changes would need to go into beets/library.py or beets/mediafile.py -- LibModel and Library are mostly generic enough and beets/importer.py doesn't seem to know too much about the individual models, but Item and Album are very audio specific.

Video items might overlap enough with the fields in Item that it makes sense to support them in beets/mediafile.py, but generic files like text documents, binaries, source files, etc. wouldn't fit very well.

One approach would be to add distinct model/database types to beets/library.py for file types which don't have the typical music associated metadata. LibModel/FileItem (any file), MediaItem (common media related fields) VideoItem, AudioItem, AudioAlbum, ImageItem, TextItem, etc.

However, the ideal outcome might be to allow defining different media types as plugins so that the end user could choose which sorts of files they want to have in their library.

Naively, I could imagine something like:

class VideoModelPlugin(BeetsPlugin):
    def supported_format(self, file_path: str, magic: str) -> bool:
        return magic in ['video/mp4']

    def attributes(self) -> Dict[str,beets.dbcore.types.Type]:
        return {'director': beets.dbcore.types.STRING}

    def parse(self, file_path: str) -> Dict[str,str]:
        meta = _LoadMetadata(file_path)
        return {'director': meta['director']}

Does this seem like a reasonable approach?

sampsyo commented 6 years ago

Yeah, that would be cool! I like the idea of model types provided by plugins. An inconvenient piece to deal with will be creating and destroying SQLite tables that back these models. I’d be interested to look into a more detailed design for how that would work.

khimaros commented 6 years ago

Howdy, @sampsyo -- to prove to myself whether a tool like beets is the right one for this job, I threw together a prototype using dbcore for crawling non-music files. I've found that adding items to an on-disk (ext4) database is several orders of magnitude slower than an in-memory one.

For an import with only 847 records:

:memory: 3.2s
test.db: 3m7.1s

Each file has ~10 (non-flexible) attributes. I'm setting them all with a single model.update() call (which, from a quick code perusal, seems to result in an SQL 'UPDATE' query for each attribute). I was initially setting each attribute one per expression which (due to the parenthetical above) seems to have no impact on performance.

Am I using the library incorrectly or is this the expected performance?

sampsyo commented 6 years ago

Wow; awesome! Except for the performance.

I'm not sure what to "expect" for performance, but that's certainly not good—maybe this would be a good lens to use for performance optimization. Would it make sense to do a little profiling? (If so, may I recommend SnakeViz to explore the data?)

Anachron commented 6 years ago

Was about to open an issue but luckily found out it is already been worked on! I will see if I can help with this,- I'm looking forward to catalogue my movies and series.

khimaros commented 6 years ago

@sampsyo, I took it for a spin in snakeviz. Unsurprisingly, the majority of the time is being spent in sqlite3.Connection.commit.

On the beets side, over 95% of the time is spent in dbcore.Model.add, dbcore.Model.store, and dbcore.Model.__exit__. Baseline runtime was 153 seconds.

Removing an unnecessary store call in the inner loop shaved about 30% from runtime. add already calls store once for each added record. Trimmed runtime down to 109 seconds.

Next improvement was setting the values for the entry at model instantiation time rather than a) instantiating with empty values, b) setting the values by attr or bulk update, and then c) calling add. Down to 76 seconds.

Next area for exploration may be supporting a bulk add with a single sqlite transaction. I'm not sure how much this would impact performance.

To summarize, the overall control flow now looks something like this:

db = ExampleDatabase(db_path)
for file_path in file_path_list:
   model = ExampleModel(db, att1=x, attr2=y, attr3=z, ...)
   model.add()
db._connection.close()

khimaros commented 6 years ago

It's worth noting that each add results in an INSERT with DEFAULT VALUES as well as a subsequent UPDATE to modify the dirty keys, even if all of the values were supplied ahead of time. This may be an area for optimization.

sampsyo commented 6 years ago

Awesome work here. That's sort of good news that we can blame our very inefficient database usage rather than anything running "in Python"!

Just to help me track this: where is the inner loop that you're referring to? That's in your own client code, right? (Not in beets itself?)

To summarize potential changes from the beets side that you mentioned:

Some form of bulk operations on models that avoids a separate store (and therefore a separate database translation) on every model creation. I expect this would be a substantial win—even if the actual transactions themselves are pretty fast, the per-transaction overhead is probably a good chunk of the time spent on the model insert cost.
Refactoring add to allow proper initialization with specific values (rather than using two transactions to create and then modify). This might be easier to do than the first thing and would halve the number of transactions, so it might be the right place to start.

khimaros commented 6 years ago

The store within an inner loop was in my own code (conceptually, inside the for loop I demonstrated above). Your summary of improvements sounds right and agree that the latter one should be simpler to implement. I don't see an obvious way to do the first one without changing the dbcore API.

GuilhermeHideki commented 6 years ago

For Updating / Writing, the first one probably could be solved introducing the unit of work pattern (The trade-off would be more memory to keep track of the objects). A example exists in SqlAlchemy (Session)

The models would have a reference to the session (which is bad, IMHO) and one would commit after all operations are done (The default way would be always commit the changes, to not break the API, at least initially):

# pseudocode
class Model(object):
  def store(self, mode='now'):
    self.session.add(self)
    if mode == 'now':
      self.session.commit()

"We can solve any problem by introducing an extra level of indirection." hahaha

khimaros commented 6 years ago

@sampsyo, I think the right target to shoot for is that the dbcore overhead should be less than the time it takes to crawl files on the target filesystem. Do you think this is achievable with SQL data store?

khimaros commented 6 years ago

Here is a really nice article on this topic: https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite -- they are using the C bindings for sqlite, but I suspect many of the lessons could apply for Python.

sampsyo commented 6 years ago

Yeah, that seems like a reasonable goal to at least shoot for. Have you checked, for instance, what the proportion of filesystem to database time is in the optimized version of your current crawler?

khimaros commented 6 years ago

@sampsyo, database time is still over 92% of total runtime. I suspect this is also a significant bottleneck when importing large music libraries.

sampsyo commented 6 years ago

Got it. It does seem like this should be achievable in the limit—the main impediment is figuring out the right abstractions to allow clients to express a high-performance treatment of the database.

khimaros commented 6 years ago

@sampsyo -- thinking a bit more, we can do something relatively uninvasive by providing a bulk_add method on the Database class. Consider the following:

db = ExampleDatabase()
db.bulk_add(
    ExampleModel({...}),
    ExampleModel({...}),
)

Caveat emptor: in order to realize the performance improvements of bulk operations, callers would need to explicitly opt into this use.

I threw together a quick prototype of this and I'm seeing total runtime down to less than 5s, with less than 10% of total time spent in dbcore/sqlite3.

Some notes about the prototype:

Model.add and Database.transaction are modified to accept an optional txn parameter
Model.add passes txn parameter through to its Database.transaction call.
If Database.transaction is called with txn param it is returned immediately.
Database.bulk_add creates a new transaction and passes it to each Model.add call.
This results in a nested call to Transaction.__enter__ as both Database.bulk_add and the transitive Model.add calls use a with on that object.
I'm not sure how this will interact with threads and didn't spend much time looking at tx_stack.

Regardless, I think this is really promising and it's now relatively clear to me that pursuing a single transaction will yield the most significant performance increase.

sampsyo commented 6 years ago

Yes, absolutely! A bulk insert would be a great way to do it. You could even imagine letting the bulk_add method accept an iterable (instead of just a list) to let new model objects get generated on the fly without materializing them all in memory.

This sounds awesome. Any chance you can put together a PR for closer review?

jtpavlock commented 4 years ago

I think I'm of the team that this is outside the scope of beets, but a fork that deals with videos could be interesting. I think there's also a limited use case for this. As mentioned, there isn't really a good standard for tagging or a whole lot that's worthwhile to tag or update as time goes on. The biggest advantage I suppose would be the database querying, but having an application like beets just to provide a cli query for your videos seems overkill when the common uses for a query could be implemented through other commandline utilities by parsing an organized video directory, or with GUI applications.

What I would recommend for the majority of people is:

Use flexget for automation and organization. Flexget already includes heuristics for determining data for videos (year, name, quality) and you can organize your media folders accordingly.
Use mpv for a lightweight, cli player.

If there's still interest, I think the best route, as suggested, is to create a fork of beets focused on videos.

I'm going to close this since this doesn't seem like something beets should implement, but feel free to continue discussion here or on discourse.

beetbox / beets

Using beets for Video Library Organization? #1935

Desired Workflow