Closed pestrela closed 5 years ago
Hi @pestrela,
Thanks for reporting this. I wasn't aware of it, since for my audio files I didn't encounter the issue yet!
Is the issue intermittent depending on the files, or for you does this issue occur for every file? If it's intermittent depending on the file, it could be tricky to fix, but first of all let's see if I can reproduce it.
If you could attach a zip archive containing an audio file for which you've observed the issue, it might help to speed up the investigation. From there I can try and reproduce it by analysing in Traktor, and then using the converter and checking the result in Rekordbox.
Thanks!
Hi, many thanks for this project just tested the 0.2.0 release, which produced a valid XML file.
However this is suffering from the "cues are shifted in time" issue that all translators face when going to/from RB/Traktor. The result looked the same as this example below (taken from CrossDJ):
The root cause is different definitions of the 00:00:00 time point: https://www.youtube.com/watch?v=Vl4nbvYmiP4
AFAIK only these 2x tools are able to fix this issue:
could you please consider addressing this issue? I can provide demo mp3s if you don't see this issue in your mp3s. thanks!
Hi Alza, This issue only happens for specific MP3s. Below an example, I can provide a lot more later: https://www.dropbox.com/s/phdpvhv9s8k9u3y/demo%20shifted%20cues%20rekordbuddy2.mp3?dl=1
I've tested many converters - they all suffer from the same issue for this example. Exceptions:
I've now analysed by hand 67 different files and found an almost perfect pattern.
If the file was encoded by LAME "3.99" or "3.99.5", the simple conversion produces shifted cues. Exception is "3.99r" Same story for "3.98", except "3.98r" or "3.98 space "
For the other LAME versions / encoders, no shifted cues were seen. Note: "unk" means the tag was empty/not present.
please see the below table for my results so far:
python code:
import pandas as pd
from io import StringIO
df1 = pd.read_csv(StringIO(a), sep="\t", names=['version','shift']).dropna()
df1['version'] = df1['version'].str.replace(" ", "_")
print("number of entries: %d" % (len(df)))
df2 = pd.crosstab(index=df1["version"], columns=df1["shift"]).sort_values(["bad", "good"], ascending=False)
df2
To analyse the encoder of the files, I've used: https://mediaarea.net/en/MediaInfo To customize the output: preferences / custom / edit / audio / %Encoded_Library%
what do you think?
extended the analysis to 300 files, analysed manually. Of these 300, I've subjectivelly found that 11% have shifted cues.
For lame 3.99 files, all of them result in shifted cues. lame 3.99.5 is now mixed, because it has 60% wrong predictions. Everything else, including 3.99r etc, only result in 2% false positives.
code and data: https://github.com/pestrela/music_scripts/blob/master/lame_shifted_cues.py
Rekorbuddy is able to correct this issue in a single go. Well done! In their own words: "Rekord Buddy deals with 5 different issues related to cue timings, and one that we are aware of but haven’t found enough data to compose a decent fix for." https://forums.next.audio/t/traktor-rekordbox-cues-shifted-in-time/415
Hi @pestrela,
Ok I've started to look into this now, I have some interesting results!
First, I actually had some LAME 3.99 & LAME 3.98 encoded files already, so I tried to reproduce the issue with those. In this case, I found that the cue shifting did not occur with any of the 3.99 and 3.98 files I tried.
Second, I tried to reproduce the issue with the file you provided:
https://www.dropbox.com/s/phdpvhv9s8k9u3y/demo%20shifted%20cues%20rekordbuddy2.mp3?dl=1
In this case, I found that the cue shifting did occur, but notably when I checked the encoder metadata for this particular file:
ffprobe -v verbose <file>
It was not LAME, but Lavf a.k.a libavformat (& the related libavcodec). I believe this encoder string indicates that FFMPEG was used to encode the file. Internally, libavcodec uses libmp3lame for mp3 encoding, but for this file it seems that the version used is not present in the file metadata, it just states Lavf.
Based on this, I then tried to reproduce the issue with Lavf and Lavc xx.xx encoded files. In this case, I found that cue shifting issue did occur, for the vast majority of files with these encoder values (although not all the files, there was at least one exception).
Conclusion: my findings do support the encoder version hypothesis to some extent, however I found that a different encoder is the culprit, Lavf and/or Lavc.
Next steps: our findings are different, so we need to clarify the situation there first before I can proceed.
Assuming we can account for this, I would then try and work out what the shift value(s) are (in seconds), and whether it's constant or not etc.
Let me know what you think!
I've now sent you privately a link to an upload of 35x files that have a clear shift. Also changed my analysis scripts to use latest ffprobe 4.1.
note: "good" files could be actually bad files, but with a very small shift. When I used RECU it sometimes reported marginal (but present) shifts
yet another program to guess the encoder: http://www.rarewares.org/rrw/encspot.php
which is a wrapper around this lib: http://mp3guessenc.sourceforge.net/
found this program on a list of mp3 tools collected by Pulse@Pioneer (mp3 information / mp3 error checkers) https://forums.pioneerdj.com/hc/en-us/articles/204681699-MP3-Tools-More
Hi @pestrela,
Thanks for these.. What's your thinking here, is this regarding a method of detecting the encoder for files that don't have an encoder tag (or the encoder tag is empty)? I'll call these files "unknown files".
I assume this is your focus, since unknown files are the biggest proportion of files in your dataset of 300(although LAME files are a close second), and the proportion with the biggest number of shifted cues?
However, it's worth noting that although this proportion has the biggest number of shifted cues, it's not the proportion with the biggest percentage of shifted cues - that goes to Lavf/Lavc:
Category | Total | Number Shifted | % Shifted |
---|---|---|---|
Lavf/Lavc (all versions?) | 13 | 8 | 62% |
Unknown | 143 | 22 | 15% |
Lame (all versions?) | 122 | 5 | 4% |
Based on the above, I am thinking that the % numbers are the most helpful indicator for determining what to do next. Although the number of Lavf/Lavc files in your dataset is comparatively small, the percentage result for those does correlate somewhat with my findings.
My current thinking for a solution is to implement a "blacklist lookup table", which would map source
+ target
+ encoder
(string regex) -> shift
(seconds)
For example (shift values are just made up):
Source | Target | Encoder | Shift |
---|---|---|---|
Traktor | Rekordbox | Lavc57.80 | 0.135 |
Traktor | Rekordbox | Lavc* | 0.143 |
Traktor | Rekordbox | Lavf* | 0.143 |
Traktor | Rekordbox | LAME3.99 | 0.128 |
I am assuming that for a given conversion (source -> target, encoder), the shift is a fixed value (this could be verified using a random sample of files for each encoder).
Of course, this solution doesn't consider unknown files.. some options for those:
There could also be a command-line option to override whether they are included or not.
For a unknown files, mp3guessenc might be helpful to determine the encoder (I've used it before), but unfortunately there doesn't seem to be a build/version for Mac OSX, which is a show-stopper in any case..
What do you think?
Today tried the following experiment: identify the precise sample of the 0:0:0 point of DJ software.
Method: For this I've played mp3s file in DJ software while recording, and putting the play position on negative values beforehand. Then I've aligned the recordings on the first downbeat, and then normalized the first 16-bit samples that are greater than zero (mod).
A description of the test procedure, inputs and all outputs are in this zip: https://www.dropbox.com/s/pgpnrw4sl3xv2tp/DAW%20shifted%20cues.zip?dl=0 DAW shifted cues.txt
Results:
Example:
Maybe found an hint on the RekordboxBox release notes. This mentions an issue about LAME gapless encoding, and claims the 44.1Khz shift to be a constant 26ms.
https://rekordbox.com/en/support/releasenote.php
What's new in rekordbox Ver.2.0.2 ● Fixed an issue with beat grid inaccuracy created with v1.6.0/v1.6.2/v2.0.0/v2.0.1.
Ver.1.6.2 (2012.08.21) What's new in rekordbox Ver.1.6.2 ... ●Improved the accuracy of beat grid information analyzed by rekordbox. ●Added a function to fix the misaligned BeatGrid and cue points in mp3 files which (i) have been encoded by LAME encoder with the gapless setting and (ii) have been analyzed and adjusted by rekordbox before version 1.5.3. (As of version 1.5.4, rekordbox has disabled gapless playback of LAME-encoded mp3 files.) ...
Ver.1.5.4 (2012.07.03) About rekordbox Version 1.5.4 Version 1.5.4 is only for MEP-4000 and new rekordbox users. Version 1.5.4 disables gapless playback for MP3 files encoded with the LAME encoder on players such as the CDJ-2000. Disabling gapless playback for MP3 files encoded with the LAME encoder in Version 1.5.4 will shift existing beat grids, loops or cue points of mp3 files encoded with the LAME encoder that have been analysed and adjusted with an older version of rekordbox. The offset value depends on the sampling frequency of the file: 24ms (in the case of 48kHz), 26ms (in the case of 44.1 kHz). However, it does not alter the audio play back on the CDJ's just visually inside rekordbox, therefore you do not need to reanalyse your tracks and redefine the beat grids, loops or cue points. Pioneer will provide a tool to automatically adjust the beat grids, loops or cue point data in a future update. We recommend that you wait. Thank you for your understanding.
Gapless encoding is detectable using byte $AF of the full lame mp3 info tag: https://wiki.hydrogenaud.io/index.php?title=Gapless_playback#Format_support http://gabriel.mp3-tech.org/mp3infotag.html
eyeD3 -P lameinfo displays --nogap: https://eyed3.readthedocs.io/en/latest/_modules/eyed3/mp3/headers.html
however this doesn't match my current dataset:
$ for DIR in {bad,good} ; do echo -n "$DIR " ; (for FILE in $DIR/*.mp3 ; do eyeD3 -P lameinfo "$FILE" 2>/dev/null | grep -a -c nogap ; done
) | awk '{A=A+$1} END{ print NR-A, A}' ; done
what | no nogap | has nogap |
---|---|---|
bad | 29 | 0 |
good | 237 | 25 |
Hi @pestrela,
Thanks for continuing the investigation.
The CUE Shift amount is different for every mp3.
Just to clarify, you're saying that the shift is different, even for files with the same encoder? This is contrary to the hypothesis in the video above, cited as root cause: https://www.youtube.com/watch?v=Vl4nbvYmiP4
Rekordbox adds variable amounts of data when playing mp3
Just to clarify, you're saying that the 2nd, 3rd or 4th load of a given file in Rekordbox will have an additional shift, compared with the 1st load of the file? Although it's small, i.e. 2ms as you say. I wonder if this is just a related, but separate Rekordbox peculiarity that can be ignored (since it's only 2ms).
Re: gapless encoding, my conclusion based on the results in your other comment, is that it's not related, it's just a coincidence due to the similar values 24/26ms vs 29ms.
the shift is different, even for files with the same encoder
This comment was because on sample1 vs sample2, which have the same encoder, according to the above method, would have different offsets. The issue is I now see that the above method (find first non-zero byte after play) doesn't seem to predict the correct offset shift that we need to apply.
the 2nd, 3rd or 4th load of a given file in Rekordbox will have an additional shift, compared with the 1st load of the file?
yes. This is yet another sign that this method is not reliable enough
moving forward, I think we should recreate parts of what RECU does, to get proper statistics on all the offsets from a whole collection. I expect a lot of outliers from the different beat-grid algorithms, but I expect that most >5ms offsets will cluster somehow correlated with mp3guessenc / ffprobe / eyeD3.
RECU is a tool that takes 2 files:
it then matches files on the first beat, computes the offset, and applies such offset to all cues of the converted XML The current RECU requires the first beat to be marked, below some code to avoid this
def bpm_period(bpm):
return (60.0 / bpm )
def find_min_beat(bpm, cue):
period = bpm_period(bpm)
beats = int(cue / period)
ret = cue - beats * period
return ret
def find_offset(bpm1, cue1, bpm2, cue2):
return find_min_beat(bpm1, cue1) - find_min_beat(bpm2, cue2)
Hi @pestrela,
moving forward, I think we should recreate parts of what RECU does, to get proper statistics on all the offsets from a whole collection. I expect a lot of outliers from the different beat-grid algorithms, but I expect that some >5ms offsets will cluster somehow correlated with mp3guessenc / ffprobe / eyeD3.
Ok I can see how this would be useful. Then we can cross-reference the shifts with other info e.g. encoder, to see if there's a pattern?
RECU is a tool that takes 2 files:
converted RBox XML, as converted by DJCU/DJDC original RBox XML, as analysed by rekordbox
it then matches files on the first beat, computes the offset, and applies such offset to all cues of the converted XML
Ok so the process could be:
collection.nml
to rekordbox.xml
(whole collection converted)rekordbox-2.xml
rekordbox.xml
and rekordbox-2.xml
, calculate the offset based on the earliest tempo position for each track, and output csv data.ffprobe
(for encoder etc) (or, parse the tag in step 3. to get the encoder, avoiding the need for this step).One issue I can see with the above process, is that step 2. could take a long time, for a large collection? For example, my collection is ~10,000 tracks...
The current RECU requires the first beat to be marked, below some code to avoid this
I was thinking I'd just take the inizio
(i.e. the position) of the earliest tempo
for each track, that would be the simplest?
Ok I can see how this would be useful.
I see 2x different use cases for this effort:
Ok so the process could be: ...
Indeed, this is how RECU works
is that step 2. could take a long time, for a large collection?
We can match the files exactly by filenames. In my python tool I'm matching my 7000 collection exactly using AUDIO_ID, this is quite fast. In python strings are checksums and matched using hash tables.
I was thinking I'd just take the
inizio
(i.e. the position) of the earliesttempo
for each track, that would be the simplest?
We should take the closest Rbox Inizio to TK, and reduce both to the same beat using the simple function from the last post. This was an issue in RECU that required the exact same beat to be marked.
An example will make this clear:
converted XML:
<TEMPO Inizio="1.95724" Bpm="126.000000" Metro="4/4" Battito="1"/>
original XML:
<TEMPO Inizio="0.024" Bpm="126.00" Metro="4/4" Battito="3"/>
<TEMPO Inizio="6.215" Bpm="126.00" Metro="4/4" Battito="4"/>
<TEMPO Inizio="164.787" Bpm="126.00" Metro="4/4" Battito="1"/>
<TEMPO Inizio="343.359" Bpm="126.00" Metro="4/4" Battito="4"/>
find_offset(126.00000, 1.95724, 126.00,0.024)
0.028478095238095434
Hi @pestrela,
Minimum: provide statistics of the offsets, optionally trigger mp3guessenc etc
Agreed, I'm currently working on a separate mini-app to get the offset based on two Rekordbox files. I'll post the results when I have them.
Optional: Serve as a post-correction tool, just like RECU, if no definitive encoder patterns arise from #1
Ok let's see what the stats tell us first. It would be good to avoid the post-correction like RECU, because even if we have a method that avoids manually marking the first beat (tempo inizio), users will still be required to do a full analysis in Rekordbox which isn't ideal.
is that step 2. could take a long time, for a large collection?
We can match the files exactly by filenames. In my python tool I'm matching my 7000 collection exactly using AUDIO_ID, this is quite fast. In python strings are checksums and matched using hash tables.
I was referring to the analysis time in Rekordbox when starting from an empty collection and adding the music folders, in order to export the rekordbox-2.xml
. Actually I just left my laptop analyzing for a while, and it's finished now!
I was thinking I'd just take the inizio (i.e. the position) of the earliest tempo for each track, that would be the simplest?
We should take the closest Rbox Inizio to TK, and reduce both to the same beat using the simple function from the last post. This was an issue in RECU that required the exact same beat to be marked. An example will make this clear: converted XML:
original XML:
<TEMPO Inizio="6.215" Bpm="126.00" Metro="4/4" Battito="4"/> <TEMPO Inizio="164.787" Bpm="126.00" Metro="4/4" Battito="1"/> <TEMPO Inizio="343.359" Bpm="126.00" Metro="4/4" Battito="4"/>
find_offset(126.00000, 1.95724, 126.00,0.024) 0.028478095238095434
I'm not 100% sure about the correctness of find_offset
, after trying a few examples, but I'll use it as-is for now and let's see what the stats look like.
agreed that a RECU-like step that depends on Rekordbox analysis is slow and cumbersome. Hopefully we will catch the LAME pattern and to correct it in a single go.
regarding slowness: the Rekordbox analysis is always required; it happens anyway when the user imports the converted XML
Trying to guess which decoding library the DJ software uses:
$ strings Traktor\ Pro\ 3/Traktor.exe | grep FhG
FhG-IIS MP3sDec Libinfo
$ strings rekordbox\ 5.4.1/rekordbox.exe | egrep -i "libmpg123.dll"
libmpg123.dll
some interesting comments from library maintainers:
https://sourceforge.net/p/lame/mailman/message/27315501/ as maintainer of the mpg123 decoding engine, I can tell you what works: Simply encode your files with lame and decode them with mpg123/libmpg123, with gapless decoding enabled. Lame stores all the necessary information by default and libmpg123 just omits the leading/trailing junk. I tested this with encode/decode roundtrips ... if you don't get the exactly same sample count that you had in the intial WAV, you found a bug in either lame or mpg123 and it should be fixed.
https://thebreakfastpost.com/2016/11/26/mp3-decoding-with-the-mad-library-weve-all-been-doing-it-wrong/ If an mp3 file starts with a Xing/LAME information frame, they are feeding that frame to the mp3 decoder rather than filtering it out, resulting in an unnecessary 1152 samples of silence at the start of the decoded audio.
In a really interesting development, some users start seeing this issue when upgrading TP2 collections to TP3. Mentioned patterns were locked files ON and multi-processing OFF.
It would be very useful to replicate this issue using traktor alone.
TP3 release dates:
- 3.0.0 — 2018-10-18
- 3.0.1 — 2018-11-01
- 3.0.2 — 2018-12-06
- 26 Oct 2018: https://support.native-instruments.com/hc/en-us/community/posts/360002416977-beat-grid-proble-with-traktor-pro-3-en-us-
- 17 november: https://support.native-instruments.com/hc/en-us/community/posts/360002619578-Whole-libraries-grid-markers-have-changed-since-upgrading-to-Traktor-pro-3-en-us-
- 13 dec 2018: https://www.reddit.com/r/DJs/comments/a5x76e/upgraded_to_traktor_pro_3_now_my_beatgrids_are/
- 14 jan: https://www.native-instruments.com/forum/threads/traktor-3-moved-a-bit-grids-vs-t2-old-grids-for-sync.345007/
Hi @pestrela,
Thanks for the updates. I was thinking to include the Traktor and Rekordbox version numbers in the analysis, since the decoders used might change between versions, affecting the results.
I've completed my initial analysis using the offset algorithm above, comparing Traktor and Rekordbox data. The code I wrote to produce the data is in a new project here: https://github.com/digital-dj-tools/dj-data-offset-analysis
The ETL happens in two steps:
/dev/notebook-1-ffprobe.clj
gets ffprobe
data for the data set, and saves it to sample-ffprobe-df.edn
/dev/notebook-2-offset-encoder.clj
loads Traktor data from a collection.nml
file, loads Rekordbox data from a rekordbox.xml
file that was exported from Rekordbox, joins them, adds the offset values, joins that to the ffprobe
data, calculates the stats and outputs csv data.Please see the sheet here, for the raw offset data, the calculated stats and the included candlestick chart: https://docs.google.com/spreadsheets/d/1uTBJSNc7zB2dN05LMkMORbxP4HxN7wc15MtoYAH6Qv0/edit?usp=sharing
Points of interest:
Please let me know your thoughts and opinions on these results.
Thanks,
Alex.
Hi thanks for this new tool, and for analyzing 1/5 of your collection.
far below the same data as CDFs, broken by encoder version.
script: https://github.com/pestrela/music_scripts/blob/master/offsets/offset-encoder.py
I'm now wondering how much noise the TK and RB beatgrid analysis algorithms introduce. as an example, this is the difference of the reference track in both MP3 and WAV formats (generated by winamp v5.666 build 3516). In this particular example the WAV differences is just 2.2ms. also interesting that traktor sees an extra 38ms, and RB an extra 12ms between MP3 and WAV.
I'm currently travelling, will analyse later my collection and the hand-tagged dataset as well (good shift / bad shift).
Hi @pestrela,
I'm now wondering how much noise the TK and RB beatgrid analysis algorithms introduce. as an example, this is the difference of the reference track in both MP3 and WAV formats (generated by winamp v5.666 build 3516). In this particular example the WAV differences is just 2.2ms. also interesting that traktor sees an extra 38ms, and RB an extra 12ms between MP3 and WAV.
I have a few questions and thoughts on this:
I am thinking that offset issues for other formats e.g. WAV (or FLAC possibly) ought to be treated as a separate issue? Although it may be related, I am just concerned that opening the investigation to other formats might slow us down narrowing down and resolving the issue for MP3. Having said that, I was actually vaguely aware of an offset issue some time ago (for Traktor alone) between FLAC and MP3, since I had converted a lot of files from FLAC to MP3 after I had previously analysed the FLAC files and then used relocate in Traktor to point at the MP3 files. Ultimately though, I am thinking to consider offsets between different formats as a separate (but possibly related issue), and perhaps even an expected issue due to the natural differences between formats. There is also AAC to consider, I haven't even looked at that!
I am wondering how you calculated these millisecond values, and what they actually represent? Could you give a worked example?
Also, a few other updates:
Just to let you know I am planning to update the Google Sheet stats soon, after analysing the rest of my Traktor collection.
Based on the results so far, do you agree there are any "definitive encoder patterns" yet? As in, are we closer to a solution, using the encoder? For example, the results for LAME and LAVC mostly correspond with the examples I observed visually (but I didn't try many files). If so, I am wondering if it's the right time to implement and test a solution:
Or, do you think there is no definitive pattern yet, and we need to investigate further? Perhaps run the analysis code against your collection and compare the results?
Let me know!
Thanks,
Alex.
now made CDFs for your whole 8335 files collection. I've zoomed both in 50ms and 500ms. source code: https://github.com/pestrela/music_scripts/blob/master/offsets/offset-encoder.py
some comments:
- Or, do you think there is no definitive pattern yet, and we need to investigate further?
I think we need to investigate further. Even for AV, the 28 ms shift is still not representative for the whole AV dataset - the values are all over the place, in particular negative.
Even worse, the latest Traktor updates make us a moving target:
I am thinking that offset issues for other formats e.g. WAV (or FLAC possibly) ought to be treated as a separate issue?
This is only an effort to reduce the variance of the MP3 graphs. I believe that we are being hurt by the difference between TK and RB beat detection algorithms.
To rule that out, I'm assuming that WAV is perfect, so any difference there between TK and RB would be pure difference on the beat algorithm. If we find this, per file, we could remove that noise from the MP3 difference we experience on the graphs
I am wondering how you calculated these millisecond values, and what they actually represent? Could you give a worked example?
sure. These values are the ms offset of the first beat according to RB and TK, per file format.
In other words, I hope that we can see very large differences in WAV between RB and TK, and use those (per file) to reduce the noise of the graphs.
Hi @pestrela,
I think we need to investigate further. Even for AV, the 28 ms shift is still not representative for the whole AV dataset - the values are all over the place, in particular negative.
Regarding the negative values, is it possible they're an effect of the algorithm being used to calculate the offset? For example, if the algorithm calculates the first beat incorrectly, the sign of the offset might be wrong even though the absolute value is correct.. Should we therefore just be interested in the absolute value of the offsets? Just a thought.
Even worse, the latest Traktor updates make us a moving target:
We've seen comments on the forums about moved cues for 3.0.0 I've now started seeing files that Rekordbuddy no longer converts correctly - and I remember this was not the case for traktor 2.11 for these specific files.
I was assuming that the DJ app version would just be another variable in the analysis, just like the DJ app itself, since the decoder libraries might get changed between versions etc. This is unfortunate but I don't think we can do anything about it? In practice, it means that generated offset data is only valid for the versions of the DJ apps that produced the data. So in the case of the dataset I've created, this is Traktor Pro 2.11.3 17
and Rekordbox 5.4.2
.
Correct me if wrong, but based on the above we'd have to extend the analysis into multiple versions and then compare the offset results between versions. Obviously this would get very painful, having to have multiple versions to hand etc. Or, perhaps we could agree to only be interested in the latest stable, non-beta, released versions, to reduce the complexity? This is a small issue for me since I've not upgraded to Traktor Pro 3 yet!
I've now sent these to Rbuddy support, but it will wait until the 2.1 release for Windows.
Cool, although it's a shame Rbuddy is closed, it prevents the pooling of our investigation resources somewhat (unless they were prepared to reveal their approach).
I am thinking that offset issues for other formats e.g. WAV (or FLAC possibly) ought to be treated as a separate issue?
This is only an effort to reduce the variance of the MP3 graphs. I believe that we are being hurt by the difference between TK and RB beat detection algorithms.
But my understanding of the offset algorithm, is that it's supposed to eliminate the beat detection differences when calculating the offset? Since naturally the beat detection will be different between DJ apps - not just the inizio, but also the bpm value (slightly).
To rule that out, I'm assuming that WAV is perfect, so any difference there between TK and RB would be pure difference on the beat algorithm. If we find this, per file, we could remove that noise from the MP3 difference we experience on the graphs
I am wondering how you calculated these millisecond values, and what they actually represent? Could you give a worked example?
sure. These values are the ms offset of the first beat according to RB and TK, per file format.
In wav, they almost match - they are only 2ms apart, around the 12ms point. in MP3, they doesn't match at all; one see the beat at 52ms, another on 24ms. this makes the 28ms difference we would like to correct.
In this particular example, 2ms differnce is built-in; so the real correction value would be 28.4-2.2 = 26.2ms
In other words, I hope that we can see very large differences in WAV between RB and TK, and use those (per file) to reduce the noise of the graphs.
Overall I'm not sure how to proceed at this point.. suggestions welcome! 🙂
In this particular example, 2ms difference is built-in; so the real correction value would be 28.4-2.2 = 26.2ms
In other words, I hope that we can see very large differences in WAV between RB and TK, and use those (per file) to reduce the noise of the graphs.
I've now compared the offset differences using both mp3 and wav cases using an eye-balled dataset (has shift / no shift after conversion). note: that this is missing the encoder as given by "dj-data-offset-analysis"
On this python processor, for every file, the WAV-seen difference is removed from the MP3-seen difference. https://github.com/pestrela/music_scripts/blob/master/offsets/mp3%20vs%20wav%20processor.py
Analysis: For both datasets, this adjustment improves all results towards zero, and reduces the long tail a lot.
This is good news to the hypothesis that the encoder is the main responsible to the seen shifts. In short, TK and RB algorithms see different first beat - so some of offsets we've seen are just due to the different algorithm. It also means that tools like RECU will degrade the accuracy for some songs (as expected; RB is inferior than TK on beat detection)
The bad news is that the 20% of the adjusted offsets are still non-zero - which means other factors are present.
regarding the difference between the datasets:
To be done: Tag the files using dj-data-offset-analysis.clj, and confirm if all "bad" files are AV
Hi @pestrela,
I just wanted to check my understanding of your logic and process:
Both TK/RB use analysis to produce a beat grid for each file, but due to their different algorithms, the "inizio" (first beat) and "tempo" (bpm) can be slightly different.
Since we are trying to analyse offsets between TK/RB for patterns relating to MP3 encoders, the TK/RB grid differences are producing "noise" in the encoder stats.
If we assume that TK/RB grid analysis differences affect MP3 and WAV equally, then taking the difference of offsets between MP3 and WAV should exclude this "noise" from the encoder stats.
So, for example if I wanted to replicate your results (and also produce encoder stats for the offset differences), I could follow this process:
Calculate offsets for a dataset of mp3 files, with n files per encoder (not all files, since we need to decode to wav, consider storage!)
ffprobe.edn
dataset, write n-files-per-encoder.edn
)n-files-per-encoder.edn
, calc offsets)file-name
, encoder
and offset
to n-files-per-encoder-offsets.edn
)Calculate offsets for the files in mp3 dataset, decoded to wav
n-files-per-encoder.edn
, exec with ffmpeg
and write wav files to output dir, write n-files-per-encoder-decoded.edn
)n-files-per-encoder-decoded.edn
, calc offsets)file-name
, encoder
and offset
to n-files-per-encoder-decoded-offsets.edn
)Calculate the offset difference between each pair of mp3 and wav files, and then calculate the encoder stats on these differences.
n-files-per-encoder-offsets.edn
, n-files-per-encoder-decoded-offsets.edn
, join on file names (less extension), calc offset differencesHi Alex, you understood my proposed method, My python code does the three pairs of inner joins to calculate all these differences. Because this was an experiment, I've pre-decoded WAVs and put them in a parallel folder structure, and put all them in a single collection.
The inputs of my script are in line 284 and following. These are: a) mp3+wav files analysed in TK, and converted to RB format (using DJDC) b) mp3+wav files only analysed by RB
https://github.com/pestrela/music_scripts/blob/master/offsets/mp3%20vs%20wav%20processor.py
PS: initially I tried to use dj-data-offset-analysis, but the inner joins had issues because the WAVs have ID3 fields like albums etc. As I still had to do the the step #3 inner join myself, in the end I've coded everything in python to do all joins + CDF graphs. still planning to finish this analysis, taking the input instead from notebook-1-ffprobe.clj, to get the ffprobe information in the mix.
Now updated my python joiner with the encoder information produced by notebook1.clj. For AV encoder (bottom graph), the 28ms shift is very sharp - but only when ignoring the beat algorithm differences. If we depend on RB and TK algorithms this seems worse than reality. This is great news!
https://github.com/pestrela/music_scripts/blob/master/offsets/mp3%20vs%20wav%20processor.py
For LAME, the previous comments remain. Some specific versions like "3.92" and "3.98" had an apparent shift, which resulted in exactly zero after WAV adjustment. This is not shown because is too few examples.
Next step is to automate the MP3+WAV analysis, as proposed above, and re-confirm that AV always produces a sharp offset around 28ms. Hopefully in this process we will find a pattern that categorizes LAME files as well!
Some news. Another user has confirmed our finding that rekordbuddy beta 2.1 no longer corrects all cues correctly.
https://forums.next.audio/t/grid-issue-serato-rekordbox/908/10 I took a look at @Simonjok’s test files and it looks like they are indeed cases not currently handled correctly by my cue marker code.
This is what we've found before:
I've now started seeing files that Rekordbuddy no longer converts correctly - and I remember this was not the case for traktor 2.11 for these specific files.
The RB developer will check these cases after 2.1 is released
If anybody is watching this thread, please manifest yourself :)
This is a summary of the story so far, with some simplifications as much as possible
As explained above, evidence for libAV is very strong using automatic methods, with the WAV corrections To be fully sure we should apply it to manually-beatgridded files as well.
First step is to get all libAV files together from our collections. Below a one-liner that copies such files to a central folder:
mkdir all_libav
find . -iname "*.mp3" | tr '\n' '\0' | xargs -0 -n1 -- ffprobe 2>&1 | egrep -i "encode|Input.*from" | grep -B1 -i "Lav" | grep -i mp3 | cut -b21- | sed 's/..$//' | sed 's/^.//' > libav.txt
cat libav.txt | xargs -d '\n' -n 1 cp --target-directory=all_libav
here we go again :)
still no manual beatgridding involved, but doing only AV files automatically now shows:
As such I currently recommend this project to always fix offsets in any AV file.
updated analysis code: https://github.com/pestrela/music_scripts/blob/master/offsets/mp3%20vs%20wav%20processor.py
Another idea for later:
Regarding LAVC vs LAVF:
This explains why we've seen probelms in all LAVC files, but only in half of the LAVF files.
Sources:
https://trac.ffmpeg.org/wiki/Encode/MP3 This page describes how to use the external libmp3lame encoding library within ffmpeg to create MP3 audio files (ffmpeg has no native MP3 encoder).
https://trac.ffmpeg.org/wiki/Using%20libav* libavcodec provides a decoding and encoding API, and all the supported codecs. libavformat provides a demuxing and muxing API, and all the supported muxers and de-muxers.
https://hydrogenaud.io/index.php?PHPSESSID=n8l4jet917mt5kdv4j8nj0nt94&topic=116062.msg957881#msg957881 Somehow google encodes their files with lame 3.99.5 (not 3.99r) and outcome is :lavf
In this process I've made a tool to query 3x programs to see the mp3 encoding. this now shows that ffprobe. As predicted before, the tools do not agree with each other
$ check_encoder.sh *.mp3 -c
ffprobe,mp3guessenc,mediainfo,file
Lavc57.48,LAME3.99.5,LAME3.99
Lavf,LAME3.99.5,LAME3.99.5
Code and options: https://github.com/pestrela/music_scripts/blob/master/offsets/check_encoder.sh
$ check_encoder.sh -h
output format:
-f full output
-s short output
-1 one-line ouput (default)
-c csv output
sub-tools:
--ffmpeg|--ffprobe ONLY run ffprobe
--mp3guessenc ONLY run mp3guessenc
--mediainfo ONLY run mediainfo
We already seen that RB uses mpg123:
$ strings rekordbox\ 5.4.1/rekordbox.exe | egrep -i "libmpg123.dll" libmpg123.dll $ strings Traktor\ Pro\ 3/Traktor.exe | grep FhG FhG-IIS MP3sDec Libinfo
Their FAQ and release history matches very well on the topic of gapless decoding. The library added this feature, it had a bug, and later it was fixed.
https://www.mpg123.de/faq.shtml Q: mpg123 does only play the intro jingle of my web radio! A: This might be collateral damage from a feature of mpg123 -- the gapless playback using sample-exact length and padding information provided at the beginning of an MP3 stream. This should be fixed in upcoming 1.14 release, please test the beta version!
https://www.mpg123.de/cgi-bin/news.cgi 2012-02-29 Thomas: Beta time! There is a beta of the next mpg123 release, 1.14 --- please get it, test it and report any issues! with a Xing/Info header that contains gapless info (which doesn't really fit the conglomerate...
2010-02-27 thomas: mpg123 1.10.1 - the most wanted maintenance release Fixes for gapless decoding: Correctly skip padding larger than one MPEG frame (strange, but occurs). Bug 2950218 (proper gapless cuts for seeking near the end).
this matches the same story on the RB side:
https://rekordbox.com/en/support/releasenote.php Ver.1.6.2 (2012.08.21) ●Added a function to fix the misaligned BeatGrid and cue points in mp3 files which (i) have been encoded by LAME encoder with the gapless setting and (ii) have been analyzed and adjusted by rekordbox before version 1.5.3. (As of version 1.5.4, rekordbox has disabled gapless playback of LAME-encoded mp3 files.)
Ver.1.5.4 (2012.07.03) Version 1.5.4 disables gapless playback for MP3 files encoded with the LAME encoder on players such > as the CDJ-2000.
About the mysterious FhG library of Traktor:
$ strings rekordbox\ 5.4.1/rekordbox.exe | egrep -i "libmpg123.dll" libmpg123.dll $ strings Traktor\ Pro\ 3/Traktor.exe | grep FhG FhG-IIS MP3sDec Libinfo
This is the mp3 surround decoder library of Fraunhofer IIS, the inventor of the first mp3 decoder: https://web.archive.org/web/20060514162705/http://www.iis.fraunhofer.de:80/amm/download/mp3surround/index.html
These web pages are gone; but this great page has a copy of the CLI encoder (v1.5) and decoder (v1.4) http://www.rarewares.org/rrw/fhgmp3s.php
Decoding our reference mp3 into WAV using both the FhG and mpg123 librasties, we immediately see our usual offset on the beats:
We will use the total length of the WAV as a proxy:
I've now made a new automated analysis, that compares: 1) old stuff: the MP3 DJ beatgrid offsets (pre-corrected for WAV adjustment) 2) new stuff: the difference between the FhG and mpg123 decoders
Note that this method no longer depends on the encoder detection. This is great news - we can now correct any file! Also note that this analysis was limited to 30 files, a mixture of "good" and "bad" files from previous analysis. This is not representative of whole collections!
https://github.com/pestrela/music_scripts/tree/master/offsets/fhg . . Results: this is the previous finding of this thread: using WAV correction we see constant offsets on the collections.
this is the new finding: comparing the decoders we see that we get either a constant zero or constant ~28ms offset
using the new method we correct the offsets of half of the test dataset - without depending on the encoder info.
Latest code is in: https://github.com/pestrela/music_scripts/tree/master/offsets/fhg
Summary of steps:
Hi @pestrela,
Many thanks for these updates, I think this is great progress 👍 🙂
On LAVC vs LAVF
Ok I understand the differences here.
re: "This explains why we've seen problems in all LAVC files, but only in half of the LAVF files." This implies to me that I could confidently implement a fix for LAVC files. LAVF files wouldn't be touched. This is ok with me, I think it's better to deliver a partial fix as and when new facts emerge?
I must admit some personal bias here in addition to the above reasoning for a partial fix, because LAVC files are a big proportion of my collection (I transcode from FLAC using ffmpeg, hence LAVC).
On the requirement to know the encoder
In one comment you mention the encoder info isn't needed anymore, but in another comment, in the process you described, the encoder info looks like it's still needed (e.g. check_encoder.sh
)?
In any case, am I correct in thinking I'll need to determine the encoder at runtime in the converter, to match the encoder-grouped stats (generated either by your code, or mine if I write it), since only some files need a correction? It still depends on the encoder used for the given file? I understand it's the different decoders used by Traktor and Rekordbox that causes the offset to manifest, but it's the encoder used that determines whether the offset will manifest (or not) for any given file?
I'm a bit confused, sorry!
On determining the encoder (assuming that's still needed)
Assuming we still need to know the encoder for every file, my understanding is that unlike the other tools, ffprobe only reports what metadata is saved with the file. Since this data is likely machine-generated when the file is created, I can't imagine that the value would be incorrect? This is why I was preferring ffprobe to determine the encoder.
I was basically ignoring the other tools and only using ffprobe, since:
a) mp3guessenc is not available on Mac and so it wouldn't be usable at runtime in the converter, and in any case I'm not sure I trust the result it gives, since it's clearly not using the file metadata.. b) mediainfo also seems not to use the file metadata (based on the example output in your comment) so the same applies there.
Or is my logic flawed?
Thanks!
I've refactored my code to make the analysis steps more clear, what they depend on, and are are the findings. https://github.com/pestrela/music_scripts/blob/master/offsets/fhg/fhg%20analysis.py
Hi @pestrela
Thanks for the update!
Just a couple of clarifications:
On "manual tag information"/"manually tagged" in step 1
What do you mean by this phrase? Is it some value in the id3 tag (comment?) to indicate these files have been visually inspected for "no shift" or "bad shift"?
On "has_shift" vs "no_shift" in steps 3,4
Let me test my understanding:
has_shift = those files within the your set of ~300, that are known by visual inspection to have a shift TK->RB?
no_shift = those files within the your set of ~300, that are known by visual inspection not to have a shift TK->RB?
Presumably there's a roughly equal proportion of both in the 300?
Also, on the charts where it states "good_shift", I presume you mean "no_shift"? (blue line)
If my understanding is correct, them I am surprised by your results in step 3 for has_shift (orange line), where you say "everything is corrected perfectly, except for 1x FN"
This is because for my files, for every file I tried using fhg-vs-mpg-nogapless that is known visually to have a shift (just a few, done manually), the WAV lengths were the same (so that's 100% false negative for me, vs your result?). n.b. these files are all LAVC encoded.
Note: I used mpg123 version 1.23.8 on WSL Debian Stretch. (i.e. slightly older than the version you're using, but I'm assuming my different results are not due to that)
One more thing, re: step 3 no_shift (blue line):
for the other 50% of these, we do the WRONG thing - we make an ofset that was not real (False positive)
I'd be interested to know if there is an encoder pattern for this 50%, for example if the encoder is always within a given set in this case.. (and the encoders in this set are not found in the other 50%)
If we can find a pattern there, then the algorithm might be length difference + encoder list check etc.
I've now updated the above post with a clear description of how the dataset was manually tagged, and how I define FPs and FNs. I've also added the input datafiles for you to be able to run it as a notebook in eg mybinder.org: https://github.com/pestrela/music_scripts/tree/master/offsets/fhg
regarding encoders: my files have a variety of random encoders, instead of having a focus on "home-made" LACV. They were not encoded by me.
In particular I'm troubled that 3x different tools give me 3x different answers on "what is the encoder of this file". Because of this, my focus has been on encoder-independent methods. However the current dataframe has the 3x encoder guesses readily available for analysis.
This is because for my files, for every file I tried using fhg-vs-mpg-nogapless that is known visually to have a shift (just a few, done manually), the WAV lengths were the same (so that's 100% false negative for me, vs your result?). n.b. these files are all LAVC encoded.
this is quite surprising indeed. In this post my LACV files from the "bad dataset" all had the 28ms shift. To confirm: that exactly these have 28ms decoded sample differences as well.
Note: I used mpg123 version 1.23.8 on WSL Debian Stretch. (i.e. slightly older than the version you're using, but I'm assuming my different results are not due to that)
I'm using mp123 1.25.10 from WSL ubuntu. The really bad stuff was around v1.10..v1.14, so this is a reasonable assumption
I believe I answered your questions on the updated post and this post Please tell me if this is not the case
I believe I answered your questions on the updated post and this post Please tell me if this is not the case
Correct 🙂
Here is a proposed solution using the fhg/mpg123 decoding method, based on current latest available analysis results.
// = parallel || = sequential
1. offset option false? just do the conversion using current method.
2. otherwise [offset option true]
3. load cached durations
4. for each file //
4.1. cached durations for location?
use cache
else
(decode fhg || get duration ) // (decode mpg123 || get duration ) || save location & durations to cache
4.2. diff durations to get offset
4.3. offset non-zero?
append output with offset [handle true positive case.. unfortunately this will also catch some false positives]
else [offset is zero]
file is vbr?
append output with offset (what value? 26ms?) [handle false negative case]
else [file is cbr]
append output without offset [handle true negative case]
4.4. append report (location, durations, offset, decision)
4.5. delete temp decoded wav files
n.b. the above pseudo-code does not yet account for false positive cases
UPDATE SEP 2019 - FINDINGS SUMMARY:
LINKS:
ALGORITHM: (updated: 16 Sep 2019)
EXAMPLE: