benfmiller / audalign

Package for aligning audio files through audio fingerprinting
MIT License
84 stars 2 forks source link

total.wav is a single channel, why not align it in a multichannel fashion? #32

Closed skinkie closed 2 years ago

skinkie commented 2 years ago

Currently total.wav seems to contain the cummulative result. I think it would be much more valuable to have a total file that contains all tracks being correctly aligned. It could be even nicer to use a container format that does not require the 'shift' to be encoded, hence the offset would only be encoded as metadata (in a edit decision list fashion).

benfmiller commented 2 years ago

Yes, total.wav contains all tracks averaged into a single channel. Audalign uses pydub to export files, so this kind of feature depends on pydub's capabilities. the write_extension argument in align lets you specify different formats than .wav. Are you suggesting that it also writes a total file with each aligned audio file being encoded as a separate channel?

I'm not sure what kind of format could be used to encode the shift as metadata. Do you have some examples of what you're thinking of?

skinkie commented 2 years ago

Are you suggesting that it also writes a total file with each aligned audio file being encoded as a separate channel?

When I aligning two mono files, my expectation was that I would receive a single single "dual-mono" file. At this moment I get three files, channel1, channel2 and total. So I would actually want to choose how the export is done. Separate channels, "dubbed" or integerated.

I'm not sure what kind of format could be used to encode the shift as metadata. Do you have some examples of what you're thinking of?

I was looking if mastroska was capable of doing this. I am very sure that SMIL is capable describing it. But nobody implements SMIL in a audio/video editor ;)

benfmiller commented 2 years ago

I'll add a new option to specify if you want the output files to be encoded as channels in a single output file. Seems like a pretty simple addition from this StackOverflow post, so I could probably have it out in a day or two.

Huh, I'm not very familiar with SMIL or mkv's, but that seems nifty! SMIL seems to be mostly a web thing?

skinkie commented 2 years ago

Huh, I'm not very familiar with SMIL or mkv's, but that seems nifty! SMIL seems to be mostly a web thing?

I see SMIL as a very advanced playlist format that does not have to be linear, so the takeaway is: it is not a container, but rather standardised way to describe how files are related towards eachother. I am also curious if MXF could do it, taking the original streams, and just placing them on a position in the timeline.

benfmiller commented 2 years ago

That's some good-to-know info! I'm also not very familiar with MXF The align functions return the total results with the shifts and corresponding match strengths. It seems like you could process the output from the recognitions into one of those container formats. It's not a feature I would plan to support any time soon, but I'd be happy to accept PRs!

The multichannel output is implemented, though!