When aligning audio and video files the default script fails

skinkie commented 2 years ago

As user I have multiple recording devices. Practically a camera, and two rode wireless go II devices. I would like to achieve alignment between the different recordings. The source data of each device may be assumed to sequential in nature, but the recordings of different devices may not have been continous, thus have a different overlap.

What I would like to see is something happening where blocks within the same folder are not correlated, but different folders are. In addition, given that the input sequence is not a 'bag of files' but a 'sorted list of files' this knowledge should be used in the alignment proces: a forward search given the last prior.

At this moment I notice after the fingerprinting process the following error when I try to add my video folder. I have also attempted to make a wav file out of all the videos. The same error applied.

From the description above I could do a iterative approach which would sequentially align files single files, by initial finger print. My preference would obviously be an unsupervised method.

VID_20220409_115501.mp4: Finding Matches...  Aligning matches
Traceback (most recent call last):
  File "/mnt/storage/home/skinkie/Sources/audalign/run_align.py", line 284, in <module>
    main(args=args)
  File "/mnt/storage/home/skinkie/Sources/audalign/run_align.py", line 196, in main
    results = ad.align(
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/__init__.py", line 36, in wrapper_decorator
    results = func(*args, **kwargs)
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/__init__.py", line 91, in align
    return aligner._align(
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/align/__init__.py", line 48, in _align
    files_shifts = calc_final_alignments(
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/align/__init__.py", line 169, in calc_final_alignments
    files_shifts = find_matches_not_in_file_shifts(
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/align/__init__.py", line 290, in find_matches_not_in_file_shifts
    nmatch_wt_most[main_name][audalign.Audalign.OFFSET_SECS] = None
AttributeError: module 'audalign' has no attribute 'Audalign'. Did you mean: 'datalign'?

benfmiller commented 2 years ago

Thanks for the suggestion! This error seems to be an oversight in the last refactoring. The current set of test audio files doesn't have a case where find_matches_not_in_file_shifts gets called because it's a bit complex. That function is for the case where the most matched file matches with files that don't match with the most matched file. I can fix that bug and do a new release in the next couple of days.

The current align methods treat all input audio files as a bag of files. The list of filenames as input is a way to only choose certain audio files or specify files in multiple locations. Are you suggesting that I add a new align_sequentials method where each folder or file is one set of sequential files with no overlap? You could even specify lists of files that are ordered like results = align_sequentials([["one_file.mp3"], ["second/one.mp3", "second/two.mp3"]]) ?

Or could I get more clarity on what this feature would look like?

Audalign currently supports the situation where file1 only overlaps with file2, which only overlaps with file1 and file3, but file3 doesn't overlap with file1. The situation where file3 overlaps with file4, which doesn't overlap with file1 or file2 is not currently supported. Is this another feature that you are asking for?

skinkie commented 2 years ago

Thanks for the suggestion! This error seems to be an oversight in the last refactoring. The current set of test audio files doesn't have a case where find_matches_not_in_file_shifts gets called because it's a bit complex. That function is for the case where the most matched file matches with files that don't match with the most matched file. I can fix that bug and do a new release in the next couple of days.

Great.

The current align methods treat all input audio files as a bag of files. The list of filenames as input is a way to only choose certain audio files or specify files in multiple locations. Are you suggesting that I add a new align_sequentials method where each folder or file is one set of sequential files with no overlap? You could even specify lists of files that are ordered like results = align_sequentials([["one_file.mp3"], ["second/one.mp3", "second/two.mp3"]]) ?

Or could I get more clarity on what this feature would look like?

Ok, to be very exhausive.

The current Rode Wireless Go II has two microphones that are recording, and unless someone is marking (by a button on the receiver) explicitly the offset is equal to the time the devices are powered on. I have not seen any thing in the original software that retrains the timingcode or start time of recording (asked Rode for that feature). The second problem is that the devices creates files with the same filename (or one or two numbers apart). This is a snafu. You can't just place it in a single folder, and by the filename you don't know from which device it came...

So in my case the two recording devices produce files that in general overlap. So the first file on device A will be typically the first file on device B. Now I guess the fingerprinting process is cheap enough to estimate that via a bag of recordings approach. But I guess the heuristic approach that could take in account the file sequence. So if the first file in the directory matches, lets not match the second file with the first again, and only try to match it in the forward direction. I do understand if you would have a patern like below, this should be supported / warned about.

AA CC
 BBB

To me the above would be 'phase one', because I already know that these files overlap. After this step I would like to have the results fingerprinted, and aligned with the video files. The video files are typically broken in pieces, so there is no need for scene detection. It may be expected that a sequential recording does not have internal gaps. I obviously understand that this causes drifts due to different time crystals etc.

Via a github issue here I noticed another thing called "ground truth creation". In the most ideal alignment software there would be some sort of "time" recovery. I provide a closed word where a sequence of events are happening. The events would be actually [empty1][event1][empty2][event2] and so on. The events may be grouped by recording device which provides a sequence of events, and may report something like a timecode, but it only can be stated: the input files are sequential. For all the other tracks the alignment should be done in relationship to eachother. For a single perspective (a graph without forks) this looks doable to me. But in the case where there is a multicam recording which start in a shared view, divert, and end up in the same place again... that feels like a bit more difficult challenge. Since that would suggest that aligment may happen between tracks, but is still continious in time.

audio 1
VIDEO 1

    cc
    CC
  /    \
aa     ee
AA     EE
---------
BB     FF
bb     ff
  \    /
    DD
    dd

VIDEO 2
audio 2

Audalign currently supports the situation where file1 only overlaps with file2, which only overlaps with file1 and file3, but file3 doesn't overlap with file1. The situation where file3 overlaps with file4, which doesn't overlap with file1 or file2 is not currently supported. Is this another feature that you are asking for?

Maybe :-) But this is like the advanced of advanced case where you would place everything on a graph. For the simple multicam recordings with independent audio, that should be aligned, forks are not really relevant.

benfmiller commented 2 years ago

I think I'm understanding a little better, but I'm still not quite getting the whole picture.

For just the audio files, are you suggesting the heuristic approach as an optimization or are there improper alignments resulting from the bag of files approach?

The heuristic approach would be: Given a folder with audio files sorted by filename (which are ordered by recording start time), align file 1 with 2, 3, 4 ..., then align 2 with 3, 4 ..., then 3 with 4 and accumulate the previous alignments?

The alignment from 'Phase one' could result in a single file output in the multichannel format from issue #32, which would then be aligned with the video files? Or could you include a snapshot of what the timeline would look like?

audio device 1 = a
audio device 2= b
video device 1=v

aaaa.  aaaaaa.  
  bbbbbb.  bbbbbb
gives output file "c"

Then
ccccccccccccccccccc
   vvvvv.     vvvvvvvvv

Audalign will currently fail to properly align

aaaa.   aaaaa.  aaaa.  aaaa
    bbbbb.   bbbbb. bbbb

It would only find at most

aaaa. aaaaa
    bbbbb.  bbbb

The multicam graph case certainly seems a bit more difficult!

skinkie commented 2 years ago

I think I'm understanding a little better, but I'm still not quite getting the whole picture.

For just the audio files, are you suggesting the heuristic approach as an optimization or are there improper alignments resulting from the bag of files approach?

The improper alignment does occur if I throw everthing in one folder with single video file. And obviously the error message that I reported here, when I threw everything in one folder. ;)

The heuristic approach would be: Given a folder with audio files sorted by filename (which are ordered by recording start time), align file 1 with 2, 3, 4 ..., then align 2 with 3, 4 ..., then 3 with 4 and accumulate the previous alignments?

I think the accumulate part is the tricky stuff. Because I think you should view what you have aligned as independent "groups" of data. The problem with accumalting is that if you would have a video segment that would extend beyond the initial audio recording aa, it can not be matched if EEE is longer than aa/bb, but does overlap with ccc/ddd.

bb ddd
aa ccc
 EEE

The alignment from 'Phase one' could result in a single file output in the multichannel format from issue #32, which would then be aligned with the video files? Or could you include a snapshot of what the timeline would look like?

I think his is the most trivial editing case, with a single scene but independently recorded.

audio 1: bb ddddddd  hhhh
audio 2: aa ccc fff ggg
video 1:  EEE  JJJJJJJ
video 2:  KKKKKKKKK

I would heuristically appoarch it as: Phase 1

part 1: aa+bb
part 2: ccc+ddddddd+fff
part 3: ggg+hhhh

Phase 2 is I think the hard part. So E_1 would overlap with part1 where E_3 would overlap with part 2. For J constraining part 2 and 3. And video 2 not introducing an extra contraint.

There are several heuristic approaches which may give an initial timeline, and the bag of input being the ultimate challenge.

audio device 1 = a
audio device 2= b
video device 1=v

aaaa.  aaaaaa.  
  bbbbbb.  bbbbbb
gives output file "c"

Then
ccccccccccccccccccc
   vvvvv.     vvvvvvvvv

Your ccc example will only work if you can establish the combined is o continious recording (part 2 in my example).

The multicam graph case certainly seems a bit more difficult!

You could also see it as a way to escape a constraint that the software can't find. Hence: "I know how these distinct parts are combined, but I don't have a clue how they relate, therefore I output two segments, how they join you can figure that out..."

skinkie commented 2 years ago

From another life :-)

https://navitia-io.medium.com/what-are-time-tables-5b110c5ee3f4

benfmiller commented 2 years ago

Interesting read!

Audalign was originally developed for aligning forensic audio files. Typically there would be one main audio event or several events, that were captured in all or most of the recordings. The audio files are all recognized against each other so that the file with the strongest matches can act as a sort of source of truth. From there, the step of aligning files that didn't match with the source of truth file is kind of superfluous because they don't match strongly enough to add much value to the total alignment.

This multiphase alignment process you're describing does make a lot more sense for the type of audio scenario you're dealing with. When I was refactoring the recognizers into separate recognizer objects a few months ago, I didn't also refactor the aligner into a separate object. I think this type of alignment process would work very well if we were to separate the alignment techniques into separate objects. I could put this feature on the backlog, though! Probably get to it in a few months. I'm always open to PRs, though!

The default script should be working now, though! Could you verify that the new version in the main branch is working before I publish a release to PyPI?

skinkie commented 2 years ago

Starting with requirements.txt, I think you can easily see where you could improve :)

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
label-studio 1.4.1.post1 requires python-dateutil==2.8.1, but you have python-dateutil 2.8.2 which is incompatible.
label-studio-converter 0.0.39 requires Pillow==9.0.0, but you have pillow 9.1.0 which is incompatible.
azure-core 1.23.0 requires typing-extensions>=4.0.1, but you have typing-extensions 3.7.4.3 which is incompatible.
audalign 1.0.1 requires attrs==20.3.0, but you have attrs 21.4.0 which is incompatible.
audalign 1.0.1 requires certifi==2020.12.5, but you have certifi 2021.10.8 which is incompatible.
audalign 1.0.1 requires cffi==1.14.4, but you have cffi 1.15.0 which is incompatible.
audalign 1.0.1 requires cycler==0.10.0, but you have cycler 0.11.0 which is incompatible.
audalign 1.0.1 requires decorator==4.4.2, but you have decorator 5.1.1 which is incompatible.
audalign 1.0.1 requires idna==2.10, but you have idna 3.3 which is incompatible.
audalign 1.0.1 requires imageio==2.9.0, but you have imageio 2.18.0 which is incompatible.
audalign 1.0.1 requires joblib==1.0.0, but you have joblib 1.1.0 which is incompatible.
audalign 1.0.1 requires kiwisolver==1.3.1, but you have kiwisolver 1.4.2 which is incompatible.
audalign 1.0.1 requires matplotlib==3.3.3, but you have matplotlib 3.5.1 which is incompatible.
audalign 1.0.1 requires networkx==2.5, but you have networkx 2.8 which is incompatible.
audalign 1.0.1 requires noisereduce==1.1.0, but you have noisereduce 2.0.0 which is incompatible.
audalign 1.0.1 requires packaging==20.8, but you have packaging 21.3 which is incompatible.
audalign 1.0.1 requires pluggy==0.13.1, but you have pluggy 1.0.0 which is incompatible.
audalign 1.0.1 requires pooch==1.3.0, but you have pooch 1.6.0 which is incompatible.
audalign 1.0.1 requires py==1.10.0, but you have py 1.11.0 which is incompatible.
audalign 1.0.1 requires pycparser==2.20, but you have pycparser 2.21 which is incompatible.
audalign 1.0.1 requires pydub==0.24.1, but you have pydub 0.25.1 which is incompatible.
audalign 1.0.1 requires pyparsing==2.4.7, but you have pyparsing 3.0.8 which is incompatible.
audalign 1.0.1 requires pytest==6.2.0, but you have pytest 7.1.2 which is incompatible.
audalign 1.0.1 requires pytest-xdist==2.2.1, but you have pytest-xdist 2.5.0 which is incompatible.
audalign 1.0.1 requires python-dateutil==2.8.1, but you have python-dateutil 2.8.2 which is incompatible.
audalign 1.0.1 requires PyWavelets==1.1.1, but you have pywavelets 1.3.0 which is incompatible.
audalign 1.0.1 requires requests==2.25.0, but you have requests 2.27.1 which is incompatible.
audalign 1.0.1 requires six==1.15.0, but you have six 1.16.0 which is incompatible.
audalign 1.0.1 requires threadpoolctl==2.1.0, but you have threadpoolctl 3.1.0 which is incompatible.
audalign 1.0.1 requires tifffile==2020.12.8, but you have tifffile 2022.4.26 which is incompatible.
audalign 1.0.1 requires tqdm==4.54.1, but you have tqdm 4.64.0 which is incompatible.
audalign 1.0.1 requires urllib3==1.26.5, but you have urllib3 1.26.9 which is incompatible.
virtualenv 20.14.1 requires filelock<4,>=3.2, but you have filelock 3.0.12 which is incompatible.
nltk 3.6.7 requires regex>=2021.8.3, but you have regex 2020.11.13 which is incompatible.
httplib2 0.19.1 requires pyparsing<3,>=2.4.2, but you have pyparsing 3.0.8 which is incompatible.

benfmiller commented 2 years ago

The new version will be 1.1.0 to test out the changes, you will have to pull down the recent merge in main from Github. Create a virtual environment with python -m venv venv and source it with source ./venv/bin/activate This allows you to have environment separation between separate projects or versions. Then run pip install -r requirements.txt to install the updated requirements to the current virtual environment.

skinkie commented 2 years ago

@benfmiller my point being; you are still including older versions of packages.

benfmiller commented 2 years ago

@skinkie I'm unsure what you're trying to accomplish here. Yes. The old version uses older versions of dependencies. No. The new version does not use older versions of dependencies.

version 1.1.0 is not published yet. Are you trying to say that the new dependencies are still outdated? They are the latest versions of the dependencies that still support python 3.8.

skinkie commented 2 years ago

@skinkie I'm unsure what you're trying to accomplish here. Yes. The old version uses older versions of dependencies. No. The new version does not use older versions of dependencies.

I spotted what went wrong at my side after your comment regarding older versions. My currently installed version of audalign was what was prompted. Issue resolved.

I cannot yet tell you if it works, because I have ended up in this mess (again).

Traceback (most recent call last):
  File "/mnt/storage/home/skinkie/Sources/audalign/run_align.py", line 301, in <module>
    main(args=args)
  File "/mnt/storage/home/skinkie/Sources/audalign/run_align.py", line 171, in main
    import audalign as ad
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/__init__.py", line 21, in <module>
    import audalign.align as aligner
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/align/__init__.py", line 8, in <module>
    import audalign.filehandler as filehandler
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/filehandler.py", line 8, in <module>
    import noisereduce
  File "/home/skinkie/.local/lib/python3.10/site-packages/noisereduce/__init__.py", line 1, in <module>
    from noisereduce.noisereduce import reduce_noise
  File "/home/skinkie/.local/lib/python3.10/site-packages/noisereduce/noisereduce.py", line 3, in <module>
    import librosa
  File "/home/skinkie/.local/lib/python3.10/site-packages/librosa/__init__.py", line 211, in <module>
    from . import decompose
  File "/home/skinkie/.local/lib/python3.10/site-packages/librosa/decompose.py", line 19, in <module>
    import sklearn.decomposition
  File "/usr/lib/python3.10/site-packages/sklearn/__init__.py", line 82, in <module>
    from .base import clone
  File "/usr/lib/python3.10/site-packages/sklearn/base.py", line 17, in <module>
    from .utils import _IS_32BIT
  File "/usr/lib/python3.10/site-packages/sklearn/utils/__init__.py", line 23, in <module>
    from .murmurhash import murmurhash3_32
  File "sklearn/utils/murmurhash.pyx", line 1, in init sklearn.utils.murmurhash
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

benfmiller commented 2 years ago

Awesome!

I've never seen this type of issue. You've resolved it before? Did you say you're running this on Gentoo? Should I go ahead and do a release to PyPI?

skinkie commented 2 years ago

I've never seen this type of issue. You've resolved it before?

Yes, it was actually the first issue I have opened here. But this time it is more difficult to resolve since I trew away LLVM-11, which is at this moment the maximum installable version for llvmlite.

Did you say you're running this on Gentoo? Should I go ahead and do a release to PyPI?

I'll try to check audalign on ubuntu today.

benfmiller commented 2 years ago

v1.1.0 was published several weeks ago which should fix this issue. I'll close this issue if there aren't any more updates in a week

benfmiller / audalign

When aligning audio and video files the default script fails #31