digital-dj-tools / dj-data-converter

A command-line app for converting data files to and from different DJ software formats.

MIT License

173 stars 13 forks source link

MP3 cues shifted in time going from Traktor to Rekordbox (AKA 26ms problem) #3

Closed pestrela closed 5 years ago

pestrela commented 5 years ago

UPDATE SEP 2019 - FINDINGS SUMMARY:

We have found that 6% of the files have a shift of 26 milliseconds when going from Traktor to Rekordbox. The other 94% of the files will be fine.
This shift is very noticeable and breaks beatgrids/loops. See below for a graphical example of this issue.
Root issue is different interpretations of the tricky MP3 LAME tag (and their derivations LACV/LAVF).
Problem: Zero LAME CRC ("case c"):
- Traktor doesn't accept the LAME tag, but interprets the whole MPEG frame as "music", producing 26ms of garbage;
- Rekordbox the same, but skips the whole MPEG frame instead.
Problem: LAVC/LAVF reduced tags ("case b"):
- Traktor produces 26ms of garbage because it doesnt understand this tag;
- Rekordbox accepts the tag as a control frame
We’ve now SOLVED this problem in dj-data-converter, a free command-line tool that works in all systems (Windows, Mac, Linux, WSL).
- This is done without any dependencies using our own home-grown mp3 LAME headers decoder.

ALGORITHM: (updated: 16 Sep 2019)

if mp3 does NOT have a Xing/INFO tag:
     case = 'A'
     correction = 0ms

 elif mp3 has Xing/INFO, but does NOT have a LAME tag:
     # typical case: has LAVC header instead
     case = 'B'
     correction = 26ms

 elif LAME tag has invalid CRC:
     # typical case: CRC= zero
     case = 'C'
     correction = 26ms

 elif LAME tag has valid CRC:
     case = 'D'
     correction = 0ms

EXAMPLE:

51672895-ebb7ee00-1fcc-11e9-8d11-6d1c00af48ff

alza-bitz commented 5 years ago

Hi @pestrela,

Thanks for reporting this. I wasn't aware of it, since for my audio files I didn't encounter the issue yet!

Is the issue intermittent depending on the files, or for you does this issue occur for every file? If it's intermittent depending on the file, it could be tricky to fix, but first of all let's see if I can reproduce it.

If you could attach a zip archive containing an audio file for which you've observed the issue, it might help to speed up the investigation. From there I can try and reproduce it by analysing in Traktor, and then using the converter and checking the result in Rekordbox.

Thanks!

pestrela commented 5 years ago

Edit 30 July 2019: Original request follows:

Hi, many thanks for this project just tested the 0.2.0 release, which produced a valid XML file.

However this is suffering from the "cues are shifted in time" issue that all translators face when going to/from RB/Traktor. The result looked the same as this example below (taken from CrossDJ): crossdj - cues shifted in time

The root cause is different definitions of the 00:00:00 time point: https://www.youtube.com/watch?v=Vl4nbvYmiP4

AFAIK only these 2x tools are able to fix this issue:

latest rekordbuddy2 beta: https://forums.next.audio/t/traktor-rekordbox-cues-shifted-in-time-2/593/5
DJCU RECU: utility / root cause explanation

could you please consider addressing this issue? I can provide demo mp3s if you don't see this issue in your mp3s. thanks!

Edit 30 July 2019: First reply follows:

Hi Alza, This issue only happens for specific MP3s. Below an example, I can provide a lot more later: https://www.dropbox.com/s/phdpvhv9s8k9u3y/demo%20shifted%20cues%20rekordbuddy2.mp3?dl=1

I've tested many converters - they all suffer from the same issue for this example. Exceptions:

DJCU: initially this gets confused as well. A post correction step fixes this, but depends on rekordbox post-analyzing the file.
rekordbuddy: somehow they detect the shift, WITHOUT the rekordbox post-step. (speculation: is this correlated to specific versions of LAME?)

pestrela commented 5 years ago

I've now analysed by hand 67 different files and found an almost perfect pattern.

If the file was encoded by LAME "3.99" or "3.99.5", the simple conversion produces shifted cues. Exception is "3.99r" Same story for "3.98", except "3.98r" or "3.98 space "

For the other LAME versions / encoders, no shifted cues were seen. Note: "unk" means the tag was empty/not present.

please see the below table for my results so far:

python code:

import pandas as pd
from io import StringIO

df1 = pd.read_csv(StringIO(a), sep="\t", names=['version','shift']).dropna()
df1['version'] = df1['version'].str.replace(" ", "_")
print("number of entries: %d" % (len(df)))

df2 = pd.crosstab(index=df1["version"], columns=df1["shift"]).sort_values(["bad", "good"], ascending=False)
df2

To analyse the encoder of the files, I've used: https://mediaarea.net/en/MediaInfo To customize the output: preferences / custom / edit / audio / %Encoded_Library%

what do you think?

pestrela commented 5 years ago

extended the analysis to 300 files, analysed manually. Of these 300, I've subjectivelly found that 11% have shifted cues.

For lame 3.99 files, all of them result in shifted cues. lame 3.99.5 is now mixed, because it has 60% wrong predictions. Everything else, including 3.99r etc, only result in 2% false positives.

code and data: https://github.com/pestrela/music_scripts/blob/master/lame_shifted_cues.py

pestrela commented 5 years ago

Rekorbuddy is able to correct this issue in a single go. Well done! In their own words: "Rekord Buddy deals with 5 different issues related to cue timings, and one that we are aware of but haven’t found enough data to compose a decent fix for." https://forums.next.audio/t/traktor-rekordbox-cues-shifted-in-time/415

alza-bitz commented 5 years ago

Hi @pestrela,

Ok I've started to look into this now, I have some interesting results!

First, I actually had some LAME 3.99 & LAME 3.98 encoded files already, so I tried to reproduce the issue with those. In this case, I found that the cue shifting did not occur with any of the 3.99 and 3.98 files I tried.

Second, I tried to reproduce the issue with the file you provided:

https://www.dropbox.com/s/phdpvhv9s8k9u3y/demo%20shifted%20cues%20rekordbuddy2.mp3?dl=1

In this case, I found that the cue shifting did occur, but notably when I checked the encoder metadata for this particular file:

ffprobe -v verbose <file>

It was not LAME, but Lavf a.k.a libavformat (& the related libavcodec). I believe this encoder string indicates that FFMPEG was used to encode the file. Internally, libavcodec uses libmp3lame for mp3 encoding, but for this file it seems that the version used is not present in the file metadata, it just states Lavf.

Based on this, I then tried to reproduce the issue with Lavf and Lavc xx.xx encoded files. In this case, I found that cue shifting issue did occur, for the vast majority of files with these encoder values (although not all the files, there was at least one exception).

Conclusion: my findings do support the encoder version hypothesis to some extent, however I found that a different encoder is the culprit, Lavf and/or Lavc.

Next steps: our findings are different, so we need to clarify the situation there first before I can proceed.

Assuming we can account for this, I would then try and work out what the shift value(s) are (in seconds), and whether it's constant or not etc.

Let me know what you think!

pestrela commented 5 years ago

I've now sent you privately a link to an upload of 35x files that have a clear shift. Also changed my analysis scripts to use latest ffprobe 4.1.

Of the 35x files with bad shifts,
- 22x have no tag
- 8x are made by Lav*
- 5x are LAME
regarding good files: of the ~300,
- 5x from LAV had no shift.

note: "good" files could be actually bad files, but with a very small shift. When I used RECU it sometimes reported marginal (but present) shifts

pestrela commented 5 years ago

yet another program to guess the encoder: http://www.rarewares.org/rrw/encspot.php

which is a wrapper around this lib: http://mp3guessenc.sourceforge.net/

found this program on a list of mp3 tools collected by Pulse@Pioneer (mp3 information / mp3 error checkers) https://forums.pioneerdj.com/hc/en-us/articles/204681699-MP3-Tools-More

alza-bitz commented 5 years ago

Hi @pestrela,

Thanks for these.. What's your thinking here, is this regarding a method of detecting the encoder for files that don't have an encoder tag (or the encoder tag is empty)? I'll call these files "unknown files".

I assume this is your focus, since unknown files are the biggest proportion of files in your dataset of 300(although LAME files are a close second), and the proportion with the biggest number of shifted cues?

However, it's worth noting that although this proportion has the biggest number of shifted cues, it's not the proportion with the biggest percentage of shifted cues - that goes to Lavf/Lavc:

Category	Total	Number Shifted	% Shifted
Lavf/Lavc (all versions?)	13	8	62%
Unknown	143	22	15%
Lame (all versions?)	122	5	4%

Based on the above, I am thinking that the % numbers are the most helpful indicator for determining what to do next. Although the number of Lavf/Lavc files in your dataset is comparatively small, the percentage result for those does correlate somewhat with my findings.

My current thinking for a solution is to implement a "blacklist lookup table", which would map source + target + encoder (string regex) -> shift (seconds)

For example (shift values are just made up):

Source	Target	Encoder	Shift
Traktor	Rekordbox	Lavc57.80	0.135
Traktor	Rekordbox	Lavc*	0.143
Traktor	Rekordbox	Lavf*	0.143
Traktor	Rekordbox	LAME3.99	0.128

I am assuming that for a given conversion (source -> target, encoder), the shift is a fixed value (this could be verified using a random sample of files for each encoder).

Of course, this solution doesn't consider unknown files.. some options for those:

Include them in the converted output by default, but print a warning and generate a report with a list of the unknown files
Don't include them in the converted output by default (but still print a warning and generate a report)

There could also be a command-line option to override whether they are included or not.

For a unknown files, mp3guessenc might be helpful to determine the encoder (I've used it before), but unfortunately there doesn't seem to be a build/version for Mac OSX, which is a show-stopper in any case..

What do you think?

pestrela commented 5 years ago

Today tried the following experiment: identify the precise sample of the 0:0:0 point of DJ software.

Method: For this I've played mp3s file in DJ software while recording, and putting the play position on negative values beforehand. Then I've aligned the recordings on the first downbeat, and then normalized the first 16-bit samples that are greater than zero (mod).

A description of the test procedure, inputs and all outputs are in this zip: https://www.dropbox.com/s/pgpnrw4sl3xv2tp/DAW%20shifted%20cues.zip?dl=0 DAW shifted cues.txt

Results:

The CUE Shift amount is different for every mp3.
- This is also the experience using RECU
Traktor plays always the same data
Rekordbox adds variable amounts of data when playing mp3

Example:

3rd vs 4th row: RBox starts outputing data (values 0..-1) at a variable time. In this example, it varied 2ms (3rd vs 4th row)
3rd vs 2nd row: RBox outputs a a shift of 29ms. This is not a constant value across mp3s. In the beginning it has a characteristic pattern of (0..-1).
1st vs 2nd row: Traktor ouputs the same data at the same time.

pestrela commented 5 years ago

Maybe found an hint on the RekordboxBox release notes. This mentions an issue about LAME gapless encoding, and claims the 44.1Khz shift to be a constant 26ms.

https://rekordbox.com/en/support/releasenote.php

What's new in rekordbox Ver.2.0.2 ● Fixed an issue with beat grid inaccuracy created with v1.6.0/v1.6.2/v2.0.0/v2.0.1.

Ver.1.6.2 (2012.08.21) What's new in rekordbox Ver.1.6.2 ... ●Improved the accuracy of beat grid information analyzed by rekordbox. ●Added a function to fix the misaligned BeatGrid and cue points in mp3 files which (i) have been encoded by LAME encoder with the gapless setting and (ii) have been analyzed and adjusted by rekordbox before version 1.5.3. (As of version 1.5.4, rekordbox has disabled gapless playback of LAME-encoded mp3 files.) ...

Ver.1.5.4 (2012.07.03) About rekordbox Version 1.5.4 Version 1.5.4 is only for MEP-4000 and new rekordbox users. Version 1.5.4 disables gapless playback for MP3 files encoded with the LAME encoder on players such as the CDJ-2000. Disabling gapless playback for MP3 files encoded with the LAME encoder in Version 1.5.4 will shift existing beat grids, loops or cue points of mp3 files encoded with the LAME encoder that have been analysed and adjusted with an older version of rekordbox. The offset value depends on the sampling frequency of the file: 24ms (in the case of 48kHz), 26ms (in the case of 44.1 kHz). However, it does not alter the audio play back on the CDJ's just visually inside rekordbox, therefore you do not need to reanalyse your tracks and redefine the beat grids, loops or cue points. Pioneer will provide a tool to automatically adjust the beat grids, loops or cue point data in a future update. We recommend that you wait. Thank you for your understanding.

pestrela commented 5 years ago

Gapless encoding is detectable using byte $AF of the full lame mp3 info tag: https://wiki.hydrogenaud.io/index.php?title=Gapless_playback#Format_support http://gabriel.mp3-tech.org/mp3infotag.html

eyeD3 -P lameinfo displays --nogap: https://eyed3.readthedocs.io/en/latest/_modules/eyed3/mp3/headers.html

however this doesn't match my current dataset:

$ for DIR in {bad,good} ; do echo -n "$DIR " ; (for FILE in $DIR/*.mp3 ; do  eyeD3 -P lameinfo "$FILE" 2>/dev/null | grep -a -c nogap ; done
) | awk '{A=A+$1} END{ print NR-A, A}' ; done

what	no nogap	has nogap
bad	29	0
good	237	25

alza-bitz commented 5 years ago

Hi @pestrela,

Thanks for continuing the investigation.

The CUE Shift amount is different for every mp3.

Just to clarify, you're saying that the shift is different, even for files with the same encoder? This is contrary to the hypothesis in the video above, cited as root cause: https://www.youtube.com/watch?v=Vl4nbvYmiP4

Rekordbox adds variable amounts of data when playing mp3

Just to clarify, you're saying that the 2nd, 3rd or 4th load of a given file in Rekordbox will have an additional shift, compared with the 1st load of the file? Although it's small, i.e. 2ms as you say. I wonder if this is just a related, but separate Rekordbox peculiarity that can be ignored (since it's only 2ms).

Re: gapless encoding, my conclusion based on the results in your other comment, is that it's not related, it's just a coincidence due to the similar values 24/26ms vs 29ms.

pestrela commented 5 years ago

the shift is different, even for files with the same encoder

This comment was because on sample1 vs sample2, which have the same encoder, according to the above method, would have different offsets. The issue is I now see that the above method (find first non-zero byte after play) doesn't seem to predict the correct offset shift that we need to apply.

the 2nd, 3rd or 4th load of a given file in Rekordbox will have an additional shift, compared with the 1st load of the file?

yes. This is yet another sign that this method is not reliable enough

pestrela commented 5 years ago

moving forward, I think we should recreate parts of what RECU does, to get proper statistics on all the offsets from a whole collection. I expect a lot of outliers from the different beat-grid algorithms, but I expect that most >5ms offsets will cluster somehow correlated with mp3guessenc / ffprobe / eyeD3.

RECU is a tool that takes 2 files:

converted RBox XML, as converted by DJCU/DJDC
original RBox XML, as analysed by rekordbox

it then matches files on the first beat, computes the offset, and applies such offset to all cues of the converted XML The current RECU requires the first beat to be marked, below some code to avoid this

def bpm_period(bpm):
    return (60.0 / bpm )

def find_min_beat(bpm, cue):
    period = bpm_period(bpm)

    beats = int(cue / period)
    ret = cue - beats * period
    return ret

def  find_offset(bpm1, cue1, bpm2, cue2):
    return find_min_beat(bpm1, cue1) - find_min_beat(bpm2, cue2)

alza-bitz commented 5 years ago

Hi @pestrela,

moving forward, I think we should recreate parts of what RECU does, to get proper statistics on all the offsets from a whole collection. I expect a lot of outliers from the different beat-grid algorithms, but I expect that some >5ms offsets will cluster somehow correlated with mp3guessenc / ffprobe / eyeD3.

Ok I can see how this would be useful. Then we can cross-reference the shifts with other info e.g. encoder, to see if there's a pattern?

RECU is a tool that takes 2 files:

converted RBox XML, as converted by DJCU/DJDC original RBox XML, as analysed by rekordbox

it then matches files on the first beat, computes the offset, and applies such offset to all cues of the converted XML

Ok so the process could be:

Run the converter app to convert collection.nml to rekordbox.xml (whole collection converted)
In an empty Rekordbox collection, add music folders matching Traktor, let Rekordbox analyse all of the files to create the grid for each (tempo data), and export to rekordbox-2.xml
Create a new mini-app which will take the rekordbox.xml and rekordbox-2.xml, calculate the offset based on the earliest tempo position for each track, and output csv data.
Join csv data from step 3. with csv data output from ffprobe (for encoder etc) (or, parse the tag in step 3. to get the encoder, avoiding the need for this step).

One issue I can see with the above process, is that step 2. could take a long time, for a large collection? For example, my collection is ~10,000 tracks...

The current RECU requires the first beat to be marked, below some code to avoid this

I was thinking I'd just take the inizio (i.e. the position) of the earliest tempo for each track, that would be the simplest?

pestrela commented 5 years ago

Ok I can see how this would be useful.

I see 2x different use cases for this effort:

Minimum: provide statistics of the offsets, optionally trigger mp3guessenc etc
Optional: Serve as a post-correction tool, just like RECU, f no definitive encoder patterns arise from #1

Ok so the process could be: ...

Indeed, this is how RECU works

is that step 2. could take a long time, for a large collection?

We can match the files exactly by filenames. In my python tool I'm matching my 7000 collection exactly using AUDIO_ID, this is quite fast. In python strings are checksums and matched using hash tables.

I was thinking I'd just take the inizio (i.e. the position) of the earliest tempo for each track, that would be the simplest?

We should take the closest Rbox Inizio to TK, and reduce both to the same beat using the simple function from the last post. This was an issue in RECU that required the exact same beat to be marked.

An example will make this clear:

converted XML:
      <TEMPO Inizio="1.95724" Bpm="126.000000" Metro="4/4" Battito="1"/>

original XML:
      <TEMPO Inizio="0.024" Bpm="126.00" Metro="4/4" Battito="3"/>
      <TEMPO Inizio="6.215" Bpm="126.00" Metro="4/4" Battito="4"/>
      <TEMPO Inizio="164.787" Bpm="126.00" Metro="4/4" Battito="1"/>
      <TEMPO Inizio="343.359" Bpm="126.00" Metro="4/4" Battito="4"/>

find_offset(126.00000, 1.95724, 126.00,0.024)
0.028478095238095434

alza-bitz commented 5 years ago

Hi @pestrela,

Minimum: provide statistics of the offsets, optionally trigger mp3guessenc etc

Agreed, I'm currently working on a separate mini-app to get the offset based on two Rekordbox files. I'll post the results when I have them.

Optional: Serve as a post-correction tool, just like RECU, if no definitive encoder patterns arise from #1

Ok let's see what the stats tell us first. It would be good to avoid the post-correction like RECU, because even if we have a method that avoids manually marking the first beat (tempo inizio), users will still be required to do a full analysis in Rekordbox which isn't ideal.

is that step 2. could take a long time, for a large collection?

We can match the files exactly by filenames. In my python tool I'm matching my 7000 collection exactly using AUDIO_ID, this is quite fast. In python strings are checksums and matched using hash tables.

I was referring to the analysis time in Rekordbox when starting from an empty collection and adding the music folders, in order to export the rekordbox-2.xml. Actually I just left my laptop analyzing for a while, and it's finished now!

I was thinking I'd just take the inizio (i.e. the position) of the earliest tempo for each track, that would be the simplest?

We should take the closest Rbox Inizio to TK, and reduce both to the same beat using the simple function from the last post. This was an issue in RECU that required the exact same beat to be marked. An example will make this clear: converted XML:

original XML:

<TEMPO Inizio="6.215" Bpm="126.00" Metro="4/4" Battito="4"/> <TEMPO Inizio="164.787" Bpm="126.00" Metro="4/4" Battito="1"/> <TEMPO Inizio="343.359" Bpm="126.00" Metro="4/4" Battito="4"/>

find_offset(126.00000, 1.95724, 126.00,0.024) 0.028478095238095434

I'm not 100% sure about the correctness of find_offset, after trying a few examples, but I'll use it as-is for now and let's see what the stats look like.

pestrela commented 5 years ago

agreed that a RECU-like step that depends on Rekordbox analysis is slow and cumbersome. Hopefully we will catch the LAME pattern and to correct it in a single go.

regarding slowness: the Rekordbox analysis is always required; it happens anyway when the user imports the converted XML

pestrela commented 5 years ago

Trying to guess which decoding library the DJ software uses:

$ strings  Traktor\ Pro\ 3/Traktor.exe | grep FhG
FhG-IIS MP3sDec Libinfo
$ strings rekordbox\ 5.4.1/rekordbox.exe | egrep -i "libmpg123.dll"
libmpg123.dll

some interesting comments from library maintainers:

https://sourceforge.net/p/lame/mailman/message/27315501/ as maintainer of the mpg123 decoding engine, I can tell you what works: Simply encode your files with lame and decode them with mpg123/libmpg123, with gapless decoding enabled. Lame stores all the necessary information by default and libmpg123 just omits the leading/trailing junk. I tested this with encode/decode roundtrips ... if you don't get the exactly same sample count that you had in the intial WAV, you found a bug in either lame or mpg123 and it should be fixed.

https://thebreakfastpost.com/2016/11/26/mp3-decoding-with-the-mad-library-weve-all-been-doing-it-wrong/ If an mp3 file starts with a Xing/LAME information frame, they are feeding that frame to the mp3 decoder rather than filtering it out, resulting in an unnecessary 1152 samples of silence at the start of the decoded audio.

pestrela commented 5 years ago

In a really interesting development, some users start seeing this issue when upgrading TP2 collections to TP3. Mentioned patterns were locked files ON and multi-processing OFF.

It would be very useful to replicate this issue using traktor alone.

 TP3 release dates:
- 3.0.0 — 2018-10-18 
- 3.0.1 — 2018-11-01 
- 3.0.2 — 2018-12-06

26 Oct 2018: https://support.native-instruments.com/hc/en-us/community/posts/360002416977-beat-grid-proble-with-traktor-pro-3-en-us-

17 november: https://support.native-instruments.com/hc/en-us/community/posts/360002619578-Whole-libraries-grid-markers-have-changed-since-upgrading-to-Traktor-pro-3-en-us-

13 dec 2018: https://www.reddit.com/r/DJs/comments/a5x76e/upgraded_to_traktor_pro_3_now_my_beatgrids_are/

14 jan: https://www.native-instruments.com/forum/threads/traktor-3-moved-a-bit-grids-vs-t2-old-grids-for-sync.345007/

alza-bitz commented 5 years ago

Hi @pestrela,

Thanks for the updates. I was thinking to include the Traktor and Rekordbox version numbers in the analysis, since the decoders used might change between versions, affecting the results.

I've completed my initial analysis using the offset algorithm above, comparing Traktor and Rekordbox data. The code I wrote to produce the data is in a new project here: https://github.com/digital-dj-tools/dj-data-offset-analysis

The ETL happens in two steps:

Firstly, /dev/notebook-1-ffprobe.clj gets ffprobe data for the data set, and saves it to sample-ffprobe-df.edn
Secondly, /dev/notebook-2-offset-encoder.clj loads Traktor data from a collection.nml file, loads Rekordbox data from a rekordbox.xml file that was exported from Rekordbox, joins them, adds the offset values, joins that to the ffprobe data, calculates the stats and outputs csv data.

Please see the sheet here, for the raw offset data, the calculated stats and the included candlestick chart: https://docs.google.com/spreadsheets/d/1uTBJSNc7zB2dN05LMkMORbxP4HxN7wc15MtoYAH6Qv0/edit?usp=sharing

Points of interest:

If we assume the offset algorithm is correct (which I am still not sure about), then to my eye
- I can see a small +ve shift for all LAVC encoded files. The shift amount depends on the LAVC encoder version.
- I can't see a shift for LAME encoded files.
The sample size was ~10,000 files, but in the ETL this gets filtered down to ~2,000 for my collection, since quite a big proportion of my Traktor collection is not analysed! I didn't actually realise this until I saw the numbers. Non-analysed files translate to a missing tempo (inizio and bpm) which results in no offset value being calculated, and these rows are then filtered out. So I could get a better data set by analysing all remaining files in Traktor. I might have a go at that.
For my collection, the raw data says ~18% have unknown encoder. I don't know if we can do anything about these.
If we trust these stats (or the stats of a bigger sample if we can make one, using the same code), then we could use the encoder to lookup the median offset for example, and make an adjustment when converting. If we don't trust these stats, then we need to get a bigger sample or try some other hypothesis.

Please let me know your thoughts and opinions on these results.

Thanks,

Alex.

pestrela commented 5 years ago

Hi thanks for this new tool, and for analyzing 1/5 of your collection.

far below the same data as CDFs, broken by encoder version.

for AV there is a cluster around 28ms.
For UNK the cluster also there - but a smaller percentage of times
for LAME the values are all over the place.

script: https://github.com/pestrela/music_scripts/blob/master/offsets/offset-encoder.py

offset-encoder

I'm now wondering how much noise the TK and RB beatgrid analysis algorithms introduce. as an example, this is the difference of the reference track in both MP3 and WAV formats (generated by winamp v5.666 build 3516). In this particular example the WAV differences is just 2.2ms. also interesting that traktor sees an extra 38ms, and RB an extra 12ms between MP3 and WAV.

mp3 vs wav differences

I'm currently travelling, will analyse later my collection and the hand-tagged dataset as well (good shift / bad shift).

alza-bitz commented 5 years ago

Hi @pestrela,

I'm now wondering how much noise the TK and RB beatgrid analysis algorithms introduce. as an example, this is the difference of the reference track in both MP3 and WAV formats (generated by winamp v5.666 build 3516). In this particular example the WAV differences is just 2.2ms. also interesting that traktor sees an extra 38ms, and RB an extra 12ms between MP3 and WAV.

I have a few questions and thoughts on this:

I am thinking that offset issues for other formats e.g. WAV (or FLAC possibly) ought to be treated as a separate issue? Although it may be related, I am just concerned that opening the investigation to other formats might slow us down narrowing down and resolving the issue for MP3. Having said that, I was actually vaguely aware of an offset issue some time ago (for Traktor alone) between FLAC and MP3, since I had converted a lot of files from FLAC to MP3 after I had previously analysed the FLAC files and then used relocate in Traktor to point at the MP3 files. Ultimately though, I am thinking to consider offsets between different formats as a separate (but possibly related issue), and perhaps even an expected issue due to the natural differences between formats. There is also AAC to consider, I haven't even looked at that!
I am wondering how you calculated these millisecond values, and what they actually represent? Could you give a worked example?

Also, a few other updates:

Just to let you know I am planning to update the Google Sheet stats soon, after analysing the rest of my Traktor collection.
Based on the results so far, do you agree there are any "definitive encoder patterns" yet? As in, are we closer to a solution, using the encoder? For example, the results for LAME and LAVC mostly correspond with the examples I observed visually (but I didn't try many files). If so, I am wondering if it's the right time to implement and test a solution:
- With each track, get the encoder
- Lookup the median offset for the encoder (based on my summary dataset, not ideal but it's the best we've got.. we can always make this dataset better over time by using the analysis code and combining various user's data)
- Adjust the tempo and cues using that value
- Files with unknown encoder would not be adjusted, and optionally not included in the output
- Files with an encoder for which there is no sample data would not be adjusted, and optionally not included in the output
- A report would be generated showing what adjustments were made (or none) for each file
Or, do you think there is no definitive pattern yet, and we need to investigate further? Perhaps run the analysis code against your collection and compare the results?

Let me know!

Thanks,

Alex.

pestrela commented 5 years ago

now made CDFs for your whole 8335 files collection. I've zoomed both in 50ms and 500ms. source code: https://github.com/pestrela/music_scripts/blob/master/offsets/offset-encoder.py

offset-encoder - 8335 files

some comments:

Medians of UNK/LAME are now tight around zero
- these continue to not be representative. Deviation is still way to large, all the way to a half a beat (0.5s at 60 BPM)
the 28ms cluster of AV is now even more clear. But median is still not representative (too much deviation)
- UNK also has the 28ms cluster as well

pestrela commented 5 years ago

Or, do you think there is no definitive pattern yet, and we need to investigate further?

I think we need to investigate further. Even for AV, the 28 ms shift is still not representative for the whole AV dataset - the values are all over the place, in particular negative.

Even worse, the latest Traktor updates make us a moving target:

We've seen comments on the forums about moved cues for 3.0.0
I've now started seeing files that Rekordbuddy no longer converts correctly - and I remember this was not the case for traktor 2.11 for these specific files.
- I've now sent these to Rbuddy support, but it will wait until the 2.1 release for Windows.

I am thinking that offset issues for other formats e.g. WAV (or FLAC possibly) ought to be treated as a separate issue?

This is only an effort to reduce the variance of the MP3 graphs. I believe that we are being hurt by the difference between TK and RB beat detection algorithms.

To rule that out, I'm assuming that WAV is perfect, so any difference there between TK and RB would be pure difference on the beat algorithm. If we find this, per file, we could remove that noise from the MP3 difference we experience on the graphs

I am wondering how you calculated these millisecond values, and what they actually represent? Could you give a worked example?

sure. These values are the ms offset of the first beat according to RB and TK, per file format.

In wav, they almost match - they are only 2ms apart, around the 12ms point.
in MP3, they doesn't match at all; one see the beat at 52ms, another on 24ms. this makes the 28ms difference we would like to correct.
- In this particular example, 2ms differnce is built-in; so the real correction value would be 28.4-2.2 = 26.2ms

In other words, I hope that we can see very large differences in WAV between RB and TK, and use those (per file) to reduce the noise of the graphs.

mp3 vs wav differences

alza-bitz commented 5 years ago

Hi @pestrela,

I think we need to investigate further. Even for AV, the 28 ms shift is still not representative for the whole AV dataset - the values are all over the place, in particular negative.

Regarding the negative values, is it possible they're an effect of the algorithm being used to calculate the offset? For example, if the algorithm calculates the first beat incorrectly, the sign of the offset might be wrong even though the absolute value is correct.. Should we therefore just be interested in the absolute value of the offsets? Just a thought.

Even worse, the latest Traktor updates make us a moving target:

We've seen comments on the forums about moved cues for 3.0.0 I've now started seeing files that Rekordbuddy no longer converts correctly - and I remember this was not the case for traktor 2.11 for these specific files.

I was assuming that the DJ app version would just be another variable in the analysis, just like the DJ app itself, since the decoder libraries might get changed between versions etc. This is unfortunate but I don't think we can do anything about it? In practice, it means that generated offset data is only valid for the versions of the DJ apps that produced the data. So in the case of the dataset I've created, this is Traktor Pro 2.11.3 17 and Rekordbox 5.4.2.

Correct me if wrong, but based on the above we'd have to extend the analysis into multiple versions and then compare the offset results between versions. Obviously this would get very painful, having to have multiple versions to hand etc. Or, perhaps we could agree to only be interested in the latest stable, non-beta, released versions, to reduce the complexity? This is a small issue for me since I've not upgraded to Traktor Pro 3 yet!

I've now sent these to Rbuddy support, but it will wait until the 2.1 release for Windows.

Cool, although it's a shame Rbuddy is closed, it prevents the pooling of our investigation resources somewhat (unless they were prepared to reveal their approach).

I am thinking that offset issues for other formats e.g. WAV (or FLAC possibly) ought to be treated as a separate issue?

This is only an effort to reduce the variance of the MP3 graphs. I believe that we are being hurt by the difference between TK and RB beat detection algorithms.

But my understanding of the offset algorithm, is that it's supposed to eliminate the beat detection differences when calculating the offset? Since naturally the beat detection will be different between DJ apps - not just the inizio, but also the bpm value (slightly).

To rule that out, I'm assuming that WAV is perfect, so any difference there between TK and RB would be pure difference on the beat algorithm. If we find this, per file, we could remove that noise from the MP3 difference we experience on the graphs

I am wondering how you calculated these millisecond values, and what they actually represent? Could you give a worked example?

sure. These values are the ms offset of the first beat according to RB and TK, per file format.

In wav, they almost match - they are only 2ms apart, around the 12ms point. in MP3, they doesn't match at all; one see the beat at 52ms, another on 24ms. this makes the 28ms difference we would like to correct.

In this particular example, 2ms differnce is built-in; so the real correction value would be 28.4-2.2 = 26.2ms

In other words, I hope that we can see very large differences in WAV between RB and TK, and use those (per file) to reduce the noise of the graphs.

Overall I'm not sure how to proceed at this point.. suggestions welcome! 🙂

pestrela commented 5 years ago

In this particular example, 2ms difference is built-in; so the real correction value would be 28.4-2.2 = 26.2ms

mp3 vs wav differences - 1 file

In other words, I hope that we can see very large differences in WAV between RB and TK, and use those (per file) to reduce the noise of the graphs.

I've now compared the offset differences using both mp3 and wav cases using an eye-balled dataset (has shift / no shift after conversion). note: that this is missing the encoder as given by "dj-data-offset-analysis"

On this python processor, for every file, the WAV-seen difference is removed from the MP3-seen difference. https://github.com/pestrela/music_scripts/blob/master/offsets/mp3%20vs%20wav%20processor.py

mp3 vs wav processor - 35 bad 262 good files

Analysis: For both datasets, this adjustment improves all results towards zero, and reduces the long tail a lot.

This is good news to the hypothesis that the encoder is the main responsible to the seen shifts. In short, TK and RB algorithms see different first beat - so some of offsets we've seen are just due to the different algorithm. It also means that tools like RECU will degrade the accuracy for some songs (as expected; RB is inferior than TK on beat detection)

The bad news is that the 20% of the adjusted offsets are still non-zero - which means other factors are present.

regarding the difference between the datasets:

"good": median is zero, as expected
"bad": median is 28ms, as expected

To be done: Tag the files using dj-data-offset-analysis.clj, and confirm if all "bad" files are AV

alza-bitz commented 5 years ago

Hi @pestrela,

I just wanted to check my understanding of your logic and process:

Both TK/RB use analysis to produce a beat grid for each file, but due to their different algorithms, the "inizio" (first beat) and "tempo" (bpm) can be slightly different.

Since we are trying to analyse offsets between TK/RB for patterns relating to MP3 encoders, the TK/RB grid differences are producing "noise" in the encoder stats.

If we assume that TK/RB grid analysis differences affect MP3 and WAV equally, then taking the difference of offsets between MP3 and WAV should exclude this "noise" from the encoder stats.

So, for example if I wanted to replicate your results (and also produce encoder stats for the offset differences), I could follow this process:

Calculate offsets for a dataset of mp3 files, with n files per encoder (not all files, since we need to decode to wav, consider storage!)
- Make a dataset of mp3 files with n files per encoder (e.g. filter the ffprobe.edn dataset, write n-files-per-encoder.edn)
- Calculate offsets for these files (e.g. inner join with n-files-per-encoder.edn, calc offsets)
- Make a dataset of the results (e.g. write file-name, encoder and offset to n-files-per-encoder-offsets.edn)
Calculate offsets for the files in mp3 dataset, decoded to wav
- Decode the files in the mp3 dataset to wav (e.g. read n-files-per-encoder.edn, exec with ffmpeg and write wav files to output dir, write n-files-per-encoder-decoded.edn)
- Analyse the wav files in TK/RB
- Calculate offsets for these files (e.g. inner join with n-files-per-encoder-decoded.edn, calc offsets)
- Make a dataset of the results (e.g. write file-name, encoder and offset to n-files-per-encoder-decoded-offsets.edn)
Calculate the offset difference between each pair of mp3 and wav files, and then calculate the encoder stats on these differences.
- Read n-files-per-encoder-offsets.edn, n-files-per-encoder-decoded-offsets.edn, join on file names (less extension), calc offset differences
- Calculate encoder stats (e.g. group by encoder)
- Write csv file

pestrela commented 5 years ago

Hi Alex, you understood my proposed method, My python code does the three pairs of inner joins to calculate all these differences. Because this was an experiment, I've pre-decoded WAVs and put them in a parallel folder structure, and put all them in a single collection.

The inputs of my script are in line 284 and following. These are: a) mp3+wav files analysed in TK, and converted to RB format (using DJDC) b) mp3+wav files only analysed by RB

https://github.com/pestrela/music_scripts/blob/master/offsets/mp3%20vs%20wav%20processor.py

PS: initially I tried to use dj-data-offset-analysis, but the inner joins had issues because the WAVs have ID3 fields like albums etc. As I still had to do the the step #3 inner join myself, in the end I've coded everything in python to do all joins + CDF graphs. still planning to finish this analysis, taking the input instead from notebook-1-ffprobe.clj, to get the ffprobe information in the mix.

pestrela commented 5 years ago

Now updated my python joiner with the encoder information produced by notebook1.clj. For AV encoder (bottom graph), the 28ms shift is very sharp - but only when ignoring the beat algorithm differences. If we depend on RB and TK algorithms this seems worse than reality. This is great news!

https://github.com/pestrela/music_scripts/blob/master/offsets/mp3%20vs%20wav%20processor.py

mp3 vs wav processor - per encoder

For LAME, the previous comments remain. Some specific versions like "3.92" and "3.98" had an apparent shift, which resulted in exactly zero after WAV adjustment. This is not shown because is too few examples.

Next step is to automate the MP3+WAV analysis, as proposed above, and re-confirm that AV always produces a sharp offset around 28ms. Hopefully in this process we will find a pattern that categorizes LAME files as well!

pestrela commented 5 years ago

Some news. Another user has confirmed our finding that rekordbuddy beta 2.1 no longer corrects all cues correctly.

https://forums.next.audio/t/grid-issue-serato-rekordbox/908/10 I took a look at @Simonjok’s test files and it looks like they are indeed cases not currently handled correctly by my cue marker code.

This is what we've found before:

I've now started seeing files that Rekordbuddy no longer converts correctly - and I remember this was not the case for traktor 2.11 for these specific files.

The RB developer will check these cases after 2.1 is released

pestrela commented 5 years ago

If anybody is watching this thread, please manifest yourself :)

This is a summary of the story so far, with some simplifications as much as possible

a) Manually adjusted cues, converted between Rekordbox->Traktor, result in time shifts. Call this amount "X"
- The majority of the MP3s will be fine
- Only some MP3s are affected .
b) Such affected MP3s have some patterns on their encoders
- AV encoder: In general very frequent, reasonably constant X = 28ms delay
- LAME encoder: Infrequent occurrences, wider variance of delays .
c) To find better patterns, a statistical analysis of thousands of files from whole collections is required. For scalability reasons this requires trusting the automatic beatgrid analysis by the DJ softwares .
d) Automatically adjusted cues will have an additional amount of time shifts. Call this "Y"
- This is when either Rekordbox, Traktor, or typically both gets the first beat location wrong. Instead, manually adjusted cues would have Y=0
- Doing the regular MP3 conversion analysis we measure "X+Y" .
e) Performing the analysis on WAV files finds "Y" exactly
- performing the analysis on both MP3 and WAV files enables to find "X+Y-Y" = "X"

pestrela commented 5 years ago

As explained above, evidence for libAV is very strong using automatic methods, with the WAV corrections To be fully sure we should apply it to manually-beatgridded files as well.

First step is to get all libAV files together from our collections. Below a one-liner that copies such files to a central folder:

mkdir all_libav
find . -iname "*.mp3" | tr '\n' '\0' | xargs -0 -n1 -- ffprobe 2>&1   | egrep -i "encode|Input.*from" | grep -B1 -i "Lav"    | grep -i mp3 | cut -b21- | sed 's/..$//' | sed 's/^.//' > libav.txt
cat libav.txt | xargs -d '\n' -n 1  cp  --target-directory=all_libav

pestrela commented 5 years ago

here we go again :)

still no manual beatgridding involved, but doing only AV files automatically now shows:

LAVC: perfectly fixed with a 28ms offset correction
LAVF: same story for the majority of files
LAME/UNK: not analysed - but we've seen 28ms problems there as well
- (later work): focusing on such UNK files could lead to ffprobe improvements later

As such I currently recommend this project to always fix offsets in any AV file.

updated analysis code: https://github.com/pestrela/music_scripts/blob/master/offsets/mp3%20vs%20wav%20processor.py

only_libav

pestrela commented 5 years ago

Another idea for later:

choose a wav that is known to beatgrid correctly in both TK and RB
- basically, where X+Y=0
convert it to mp3 on ffmpeg using as many variations as possible
- using AV, lame, nogap, channels, bitrate
- tag the combination on the filename itself
compare offsets automatically
same, but for old versions of libav and liblame

pestrela commented 5 years ago

Regarding LAVC vs LAVF:

LAVC is used when ffmpeg reencodes data using the external libmp3lam (ffmpeg has no native MP3 encoder).
LAVF is used when ffmpeg reconstructs the external mp3 container without reencoding the sound data.

This explains why we've seen probelms in all LAVC files, but only in half of the LAVF files.

Sources:

https://trac.ffmpeg.org/wiki/Encode/MP3 This page describes how to use the external libmp3lame encoding library within ffmpeg to create MP3 audio files (ffmpeg has no native MP3 encoder).

https://trac.ffmpeg.org/wiki/Using%20libav* libavcodec provides a decoding and encoding API, and all the supported codecs. libavformat provides a demuxing and muxing API, and all the supported muxers and de-muxers.

https://hydrogenaud.io/index.php?PHPSESSID=n8l4jet917mt5kdv4j8nj0nt94&topic=116062.msg957881#msg957881 Somehow google encodes their files with lame 3.99.5 (not 3.99r) and outcome is :lavf

pestrela commented 5 years ago

In this process I've made a tool to query 3x programs to see the mp3 encoding. this now shows that ffprobe. As predicted before, the tools do not agree with each other

$ check_encoder.sh *.mp3 -c
ffprobe,mp3guessenc,mediainfo,file
Lavc57.48,LAME3.99.5,LAME3.99
Lavf,LAME3.99.5,LAME3.99.5

Code and options: https://github.com/pestrela/music_scripts/blob/master/offsets/check_encoder.sh

$ check_encoder.sh -h
output format:
  -f     full output
  -s     short output
  -1     one-line ouput  (default)
  -c     csv output

sub-tools:
  --ffmpeg|--ffprobe    ONLY run ffprobe
  --mp3guessenc         ONLY run mp3guessenc
  --mediainfo           ONLY run mediainfo

pestrela commented 5 years ago

We already seen that RB uses mpg123:

$ strings rekordbox\ 5.4.1/rekordbox.exe | egrep -i "libmpg123.dll" libmpg123.dll $ strings Traktor\ Pro\ 3/Traktor.exe | grep FhG FhG-IIS MP3sDec Libinfo

Their FAQ and release history matches very well on the topic of gapless decoding. The library added this feature, it had a bug, and later it was fixed.

https://www.mpg123.de/faq.shtml Q: mpg123 does only play the intro jingle of my web radio! A: This might be collateral damage from a feature of mpg123 -- the gapless playback using sample-exact length and padding information provided at the beginning of an MP3 stream. This should be fixed in upcoming 1.14 release, please test the beta version!

https://www.mpg123.de/cgi-bin/news.cgi 2012-02-29 Thomas: Beta time! There is a beta of the next mpg123 release, 1.14 --- please get it, test it and report any issues! with a Xing/Info header that contains gapless info (which doesn't really fit the conglomerate...

2010-02-27 thomas: mpg123 1.10.1 - the most wanted maintenance release Fixes for gapless decoding: Correctly skip padding larger than one MPEG frame (strange, but occurs). Bug 2950218 (proper gapless cuts for seeking near the end).

this matches the same story on the RB side:

https://rekordbox.com/en/support/releasenote.php Ver.1.6.2 (2012.08.21) ●Added a function to fix the misaligned BeatGrid and cue points in mp3 files which (i) have been encoded by LAME encoder with the gapless setting and (ii) have been analyzed and adjusted by rekordbox before version 1.5.3. (As of version 1.5.4, rekordbox has disabled gapless playback of LAME-encoded mp3 files.)

Ver.1.5.4 (2012.07.03) Version 1.5.4 disables gapless playback for MP3 files encoded with the LAME encoder on players such > as the CDJ-2000.

pestrela commented 5 years ago

About the mysterious FhG library of Traktor:

$ strings rekordbox\ 5.4.1/rekordbox.exe | egrep -i "libmpg123.dll" libmpg123.dll $ strings Traktor\ Pro\ 3/Traktor.exe | grep FhG FhG-IIS MP3sDec Libinfo

This is the mp3 surround decoder library of Fraunhofer IIS, the inventor of the first mp3 decoder: https://web.archive.org/web/20060514162705/http://www.iis.fraunhofer.de:80/amm/download/mp3surround/index.html

These web pages are gone; but this great page has a copy of the CLI encoder (v1.5) and decoder (v1.4) http://www.rarewares.org/rrw/fhgmp3s.php

Decoding our reference mp3 into WAV using both the FhG and mpg123 librasties, we immediately see our usual offset on the beats: Adobe audition - front

We will use the total length of the WAV as a proxy: Adobe audition - back

pestrela commented 5 years ago

I've now made a new automated analysis, that compares: 1) old stuff: the MP3 DJ beatgrid offsets (pre-corrected for WAV adjustment) 2) new stuff: the difference between the FhG and mpg123 decoders

Note that this method no longer depends on the encoder detection. This is great news - we can now correct any file! Also note that this analysis was limited to 30 files, a mixture of "good" and "bad" files from previous analysis. This is not representative of whole collections!

https://github.com/pestrela/music_scripts/tree/master/offsets/fhg . . Results: results 2 this is the previous finding of this thread: using WAV correction we see constant offsets on the collections.

results 1 this is the new finding: comparing the decoders we see that we get either a constant zero or constant ~28ms offset

results 3 using the new method we correct the offsets of half of the test dataset - without depending on the encoder info.

pestrela commented 5 years ago

Latest code is in: https://github.com/pestrela/music_scripts/tree/master/offsets/fhg

Summary of steps:

NML and XML collection files:
- decode MP3s to WAV, using any decoder
- import and automatically beatgrid all these MP3+WAVs, for both RB and TK
- convert "collection.nml" into XML using any converter (dj-data-converter, DJCU, etc)
Bash CSVs:
- run "check_encoder.sh" on the MP3s to generate an encoder detection CSV. Detectors Covered:
  - ffprobe,
  - mp3guessenc
  - mediainfo
- run "mp3_offset.sh":
- this decodes the MP3s into a temp folder, using both decoders ( mpg123 --nogap / mp3sDecoder --classic )
- generates a CSV with the size of file in milliseconds (mediainfo --Inform="Audio;%Duration%")
Python Analysis:
- run "fhg analysis.py" ipython notebook to merge everything and create CDFs. Steps:
  - read collections, calculate mp3 differences, calculate WAV differences
  - read encoder and offset CSVs, calculate decoder difference
  - merge everything, subtract mp3 differences into decoder difference to find a final prediction

alza-bitz commented 5 years ago

Hi @pestrela,

Many thanks for these updates, I think this is great progress 👍 🙂

On LAVC vs LAVF

Ok I understand the differences here.

re: "This explains why we've seen problems in all LAVC files, but only in half of the LAVF files." This implies to me that I could confidently implement a fix for LAVC files. LAVF files wouldn't be touched. This is ok with me, I think it's better to deliver a partial fix as and when new facts emerge?

I must admit some personal bias here in addition to the above reasoning for a partial fix, because LAVC files are a big proportion of my collection (I transcode from FLAC using ffmpeg, hence LAVC).
On the requirement to know the encoder

In one comment you mention the encoder info isn't needed anymore, but in another comment, in the process you described, the encoder info looks like it's still needed (e.g. check_encoder.sh)?

In any case, am I correct in thinking I'll need to determine the encoder at runtime in the converter, to match the encoder-grouped stats (generated either by your code, or mine if I write it), since only some files need a correction? It still depends on the encoder used for the given file? I understand it's the different decoders used by Traktor and Rekordbox that causes the offset to manifest, but it's the encoder used that determines whether the offset will manifest (or not) for any given file?

I'm a bit confused, sorry!
On determining the encoder (assuming that's still needed)

Assuming we still need to know the encoder for every file, my understanding is that unlike the other tools, ffprobe only reports what metadata is saved with the file. Since this data is likely machine-generated when the file is created, I can't imagine that the value would be incorrect? This is why I was preferring ffprobe to determine the encoder.

I was basically ignoring the other tools and only using ffprobe, since:

a) mp3guessenc is not available on Mac and so it wouldn't be usable at runtime in the converter, and in any case I'm not sure I trust the result it gives, since it's clearly not using the file metadata.. b) mediainfo also seems not to use the file metadata (based on the example output in your comment) so the same applies there.

Or is my logic flawed?

Thanks!

pestrela commented 5 years ago

I've refactored my code to make the analysis steps more clear, what they depend on, and are are the findings. https://github.com/pestrela/music_scripts/blob/master/offsets/fhg/fhg%20analysis.py

Dataset manual tagging

"good shift" AKA "no_shift"
- 285 random files where did not see a shift after simple conversion
- these are grouped in a sub-folder called "good"
"bad_shift" AKA "has_shift"
- 39 random files where we did see shift after simple conversion
- these are grouped in a sub-folder called "bad"

Terminology:

True negative (TN)
- we predict that nothing should be done - and we see no shift when loaded in RB
True Positive (TP)
- we predict that we should correct the cues for at least 10ms - and only then we see no shift when loaded in RB
False negative (FN)
- we predict that nothing should be done - but we see a shift when loaded in RB.
  - This is the problem that started this long thread - seeing shifts in RB on basic conversion.
False positive (FP)
- we predict a correction is needed - but we actually do self-inflicted pain, as no correction was actually required

step 1:

objective:
- compare XML inizios for MP3s+WAV that were previously manually tagged (visual inspection)
input:
- a) rekordbox.XML auto analysed
- b) traktor.NML auto analysed converted to XML
number of files:
- 285 random files on good dataset (AKA no shift)
- 39 random files on bad dataset (AKA has_shift)
analysis:
- good_shift: (blue line)
  - for 60% of these, this analysis confirms no shift.
  - For the other the results are all over the place - but we know there is no shift. This means this method is not accurate for all cases.
- has_shift: (orange line)
  - 60% of the files we see the 28ms shift. Which is positive, as we are on the "has_shift" dataset
  - rest of the story is the same as above

step 2:

objective:
- compare decoded WAV with FHG and MPG decoders, for various parameters
input:
- a) mp3_offsets.csv with the sizes of 4x WAV files (fhg, fhg -of1, mpg123 default, mpg123 --nogapless)
- b) mp3_encoder.CSV (note: this is processed, but not yet used on the analysis)
number of files:
- went from ~300 to 53 (disk space got full!)
analysis:
- fhg "-of1" parameter had no influence, so it will be ignored
- mpg123 without "nogapless" resulted in the same offset as above, plus an aditional offset for the same 80% files
- "mpg123 --nogapless" resulted in a constant offset of 80% of the files. This will be used moving forward

step 3:

objective:
- add the manual tag information to step2 to predict what corrections we will perform, broken by tag.
expected outcome:
- in a nutshell we should predict >10ms corrections for the "bad" dataset, and no action for the "good" dataset
input:
- step1 (only the manual tag)
- step2 (everything)
number of files:
- went from 53 to 32 (we got bad luck on the matching)
- these random files have all kinds of encoders. This column is available, but not analysed yet!
analysis:
- has_shift: (orange line)
  - we predict that all files have a constnat 28ms offset.
  - there is one exception where we wrongly do not correct it, but we should. We call that a False negative
- good_shift: (blue line)
  - for 50% of these, we will do the right thing: no adjustment (because we are on the "good shift" dataset)
  - for the other 50% of these, we will actually do the WRONG thing - we see an offset that will not appear on the conversion process. We call this a False positive.

step 4:

objective:
- actually merge the results from step1 to step2, to confirm that we indeed correct the right files and ignore the others
input:
- step1 (everything)
- step2 (everything)
number of files:
- same 32
comments:
- has_shift: (orange line)
  - everything good. final shift = zero. Profit!
- no_shift: (blue line)
  - same story as above; we have 50% of this class as False positives

step4

alza-bitz commented 5 years ago

Hi @pestrela

Thanks for the update!

Just a couple of clarifications:

On "manual tag information"/"manually tagged" in step 1

What do you mean by this phrase? Is it some value in the id3 tag (comment?) to indicate these files have been visually inspected for "no shift" or "bad shift"?
On "has_shift" vs "no_shift" in steps 3,4

Let me test my understanding:

has_shift = those files within the your set of ~300, that are known by visual inspection to have a shift TK->RB?

no_shift = those files within the your set of ~300, that are known by visual inspection not to have a shift TK->RB?

Presumably there's a roughly equal proportion of both in the 300?

Also, on the charts where it states "good_shift", I presume you mean "no_shift"? (blue line)

If my understanding is correct, them I am surprised by your results in step 3 for has_shift (orange line), where you say "everything is corrected perfectly, except for 1x FN"

This is because for my files, for every file I tried using fhg-vs-mpg-nogapless that is known visually to have a shift (just a few, done manually), the WAV lengths were the same (so that's 100% false negative for me, vs your result?). n.b. these files are all LAVC encoded.

Note: I used mpg123 version 1.23.8 on WSL Debian Stretch. (i.e. slightly older than the version you're using, but I'm assuming my different results are not due to that)

alza-bitz commented 5 years ago

One more thing, re: step 3 no_shift (blue line):

for the other 50% of these, we do the WRONG thing - we make an ofset that was not real (False positive)

I'd be interested to know if there is an encoder pattern for this 50%, for example if the encoder is always within a given set in this case.. (and the encoders in this set are not found in the other 50%)

If we can find a pattern there, then the algorithm might be length difference + encoder list check etc.

pestrela commented 5 years ago

I've now updated the above post with a clear description of how the dataset was manually tagged, and how I define FPs and FNs. I've also added the input datafiles for you to be able to run it as a notebook in eg mybinder.org: https://github.com/pestrela/music_scripts/tree/master/offsets/fhg

pestrela commented 5 years ago

regarding encoders: my files have a variety of random encoders, instead of having a focus on "home-made" LACV. They were not encoded by me.

In particular I'm troubled that 3x different tools give me 3x different answers on "what is the encoder of this file". Because of this, my focus has been on encoder-independent methods. However the current dataframe has the 3x encoder guesses readily available for analysis.

This is because for my files, for every file I tried using fhg-vs-mpg-nogapless that is known visually to have a shift (just a few, done manually), the WAV lengths were the same (so that's 100% false negative for me, vs your result?). n.b. these files are all LAVC encoded.

this is quite surprising indeed. In this post my LACV files from the "bad dataset" all had the 28ms shift. To confirm: that exactly these have 28ms decoded sample differences as well.

Note: I used mpg123 version 1.23.8 on WSL Debian Stretch. (i.e. slightly older than the version you're using, but I'm assuming my different results are not due to that)

I'm using mp123 1.25.10 from WSL ubuntu. The really bad stuff was around v1.10..v1.14, so this is a reasonable assumption

pestrela commented 5 years ago

I believe I answered your questions on the updated post and this post Please tell me if this is not the case

alza-bitz commented 5 years ago

I believe I answered your questions on the updated post and this post Please tell me if this is not the case

Correct 🙂

Here is a proposed solution using the fhg/mpg123 decoding method, based on current latest available analysis results.

// = parallel || = sequential

1. offset option false? just do the conversion using current method.
2. otherwise [offset option true]
3. load cached durations 
4. for each file // 
4.1. cached durations for location?
    use cache
else
    (decode fhg || get duration ) // (decode mpg123 || get duration ) || save location & durations to cache
4.2. diff durations to get offset 
4.3. offset non-zero? 
    append output with offset [handle true positive case.. unfortunately this will also catch some false positives]
else [offset is zero]
    file is vbr?
        append output with offset (what value? 26ms?) [handle false negative case]
    else [file is cbr]
        append output without offset [handle true negative case]
4.4. append report (location, durations, offset, decision)
4.5. delete temp decoded wav files

n.b. the above pseudo-code does not yet account for false positive cases

digital-dj-tools / dj-data-converter

MP3 cues shifted in time going from Traktor to Rekordbox (AKA 26ms problem) #3

UPDATE SEP 2019 - FINDINGS SUMMARY:

LINKS:

ALGORITHM: (updated: 16 Sep 2019)

EXAMPLE:

Edit 30 July 2019: Original request follows:

Edit 30 July 2019: First reply follows:

Dataset manual tagging

Terminology:

step 1:

step 2:

step 3:

step 4: