Closed alanorth closed 1 year ago
Another one:
$ curator tag -s audio -t language --only-macrolanguages The\ Dark\ Valley\ \(2014\).mkv
┌───┬────────────────────────────┬────────┬─────────┬───┬─────────┐
│ # │ Name │ Stream │ Old tag │ → │ New tag │
├───┼────────────────────────────┼────────┼─────────┼───┼─────────┤
│ 1 │ The Dark Valley (2014).mkv │ 1 │ ger │ → │ eng │
└───┴────────────────────────────┴────────┴─────────┴───┴─────────┘
Continue? [y/N]
This one is definitely German audio, and lots of it.
I'll add the usual verbosity CLI flags: -v
, --verbose
, to display the probabilities.
And specifically for tagging, I think there should be a clear cut-off, e.g. below X probability do not suggest updating tags.
Additionally I think that we should let uses customize the number of samples taken for the analysis.
Right now the hardcoded number is 10: https://github.com/AlexAltea/curator/blob/master/curator/stream.py#L72
But some users might prefer higher accuracy in exchange of slower processing time.
First part is done, now if you enable debug logging `--log=DEBUG' you should see something like:
curator tag --log=DEBUG -s audio -t language '.\temp\Airplane! (1980) [English].mp4'
2023-01-25 14:06:23,666 | INFO | Processing 1 input media files
2023-01-25 14:06:23,724 | DEBUG | Detecting audio language in stream #1 of media: "Airplane! (1980) [English].mp4"
2023-01-25 14:06:26,258 | DEBUG | Sample #00: {'en': '0.2963', 'la': '0.2408', 'nn': '0.1822', 'cy': '0.0706', 'zh': '0.0384'}
2023-01-25 14:06:26,983 | DEBUG | Sample #01: {'en': '0.9895', 'nn': '0.0015', 'fr': '0.0010', 'la': '0.0009', 'ja': '0.0007'}
2023-01-25 14:06:27,760 | DEBUG | Sample #02: {'en': '0.9461', 'pt': '0.0067', 'ru': '0.0054', 'ko': '0.0054', 'de': '0.0053'}
2023-01-25 14:06:28,518 | DEBUG | Sample #03: {'en': '0.9845', 'pt': '0.0017', 'fr': '0.0015', 'nn': '0.0012', 'ja': '0.0011'}
2023-01-25 14:06:29,274 | DEBUG | Sample #04: {'en': '0.9880', 'fr': '0.0014', 'pt': '0.0010', 'zh': '0.0010', 'de': '0.0008'}
2023-01-25 14:06:30,095 | DEBUG | Sample #05: {'en': '0.9767', 'nn': '0.0024', 'la': '0.0024', 'fr': '0.0022', 'de': '0.0017'}
2023-01-25 14:06:30,908 | DEBUG | Sample #06: {'en': '0.9290', 'nn': '0.0090', 'de': '0.0085', 'la': '0.0064', 'ja': '0.0058'}
2023-01-25 14:06:31,770 | DEBUG | Sample #07: {'en': '0.9892', 'nn': '0.0032', 'ja': '0.0008', 'fr': '0.0006', 'pt': '0.0006'}
2023-01-25 14:06:32,632 | DEBUG | Sample #08: {'en': '0.9903', 'la': '0.0018', 'es': '0.0007', 'ja': '0.0007', 'fr': '0.0007'}
2023-01-25 14:06:33,511 | DEBUG | Sample #09: {'en': '0.9184', 'ja': '0.0101', 'nn': '0.0082', 'ru': '0.0065', 'zh': '0.0061'}
[...]
Can you share the results for the movie whose language gets misrecognized?
Now you can also customize the number of samples via --max-audio-samples
.
I've gotten fairly good results even with --max-audio-samples=5
so I'm quite interested about your case with "The Dark Valley (2014).mkv".
Debug mode is sweet! Here is the first one, where the soundtrack has no dialog:
$ curator tag -s audio -t language --only-macrolanguages --log=DEBUG The\ Red\ Turtle\ \(2016\).mkv
2023-01-25 20:40:34,188 | INFO | Processing 1 input media files
2023-01-25 20:40:34,308 | DEBUG | Detecting audio language in stream #1 of media: "The Red Turtle (2016).mkv"
2023-01-25 20:40:36,596 | DEBUG | Sample #00: {'cy': '0.4168', 'en': '0.2776', 'nn': '0.0851', 'zh': '0.0820', 'ja': '0.0171'}
2023-01-25 20:40:38,896 | DEBUG | Sample #01: {'en': '0.6997', 'nn': '0.0995', 'zh': '0.0549', 'ko': '0.0228', 'ru': '0.0158'}
2023-01-25 20:40:41,112 | DEBUG | Sample #02: {'en': '0.5794', 'nn': '0.1794', 'zh': '0.0433', 'ru': '0.0392', 'ko': '0.0220'}
2023-01-25 20:40:43,579 | DEBUG | Sample #03: {'en': '0.7473', 'zh': '0.0670', 'nn': '0.0469', 'ru': '0.0309', 'ko': '0.0192'}
2023-01-25 20:40:46,095 | DEBUG | Sample #04: {'nn': '0.4392', 'en': '0.2651', 'zh': '0.0941', 'ko': '0.0788', 'jw': '0.0201'}
2023-01-25 20:40:50,791 | DEBUG | Sample #05: {'en': '0.6740', 'nn': '0.1084', 'zh': '0.0544', 'ko': '0.0244', 'la': '0.0214'}
2023-01-25 20:40:56,974 | DEBUG | Sample #06: {'en': '0.5547', 'zh': '0.1205', 'la': '0.1107', 'nn': '0.0309', 'ru': '0.0227'}
2023-01-25 20:41:07,375 | DEBUG | Sample #07: {'en': '0.5111', 'ru': '0.1270', 'nn': '0.1121', 'zh': '0.0442', 'ja': '0.0314'}
2023-01-25 20:41:19,530 | DEBUG | Sample #08: {'nn': '0.5731', 'en': '0.2331', 'ko': '0.0357', 'ja': '0.0274', 'zh': '0.0259'}
2023-01-25 20:41:33,377 | DEBUG | Sample #09: {'en': '0.3591', 'zh': '0.2241', 'la': '0.0982', 'nn': '0.0679', 'jw': '0.0383'}
┌───┬───────────────────────────┬────────┬─────────┬───┬─────────┐
│ # │ Name │ Stream │ Old tag │ → │ New tag │
├───┼───────────────────────────┼────────┼─────────┼───┼─────────┤
│ 1 │ The Red Turtle (2016).mkv │ 1 │ fre │ → │ eng │
└───┴───────────────────────────┴────────┴─────────┴───┴─────────┘
Continue? [y/N]
And the second one, where the language is definitely German:
$ curator tag -s audio -t language --only-macrolanguages --log=DEBUG The\ Dark\ Valley\ \(2014\).mkv
2023-01-25 20:43:13,410 | INFO | Processing 1 input media files
2023-01-25 20:43:13,524 | DEBUG | Detecting audio language in stream #1 of media: "The Dark Valley (2014).mkv"
2023-01-25 20:43:15,869 | DEBUG | Sample #00: {'en': '0.5387', 'la': '0.1408', 'nn': '0.0857', 'zh': '0.0532', 'ja': '0.0281'}
2023-01-25 20:43:18,767 | DEBUG | Sample #01: {'zh': '0.4106', 'de': '0.3250', 'ko': '0.0568', 'ja': '0.0509', 'ru': '0.0332'}
2023-01-25 20:43:21,614 | DEBUG | Sample #02: {'de': '0.9772', 'nn': '0.0048', 'en': '0.0026', 'nl': '0.0025', 'fr': '0.0023'}
2023-01-25 20:43:25,029 | DEBUG | Sample #03: {'nn': '0.6693', 'en': '0.1933', 'ko': '0.0244', 'haw': '0.0233', 'ja': '0.0154'}
2023-01-25 20:43:33,731 | DEBUG | Sample #04: {'en': '0.3350', 'nn': '0.3216', 'haw': '0.1072', 'zh': '0.0508', 'ko': '0.0368'}
2023-01-25 20:43:45,177 | DEBUG | Sample #05: {'nn': '0.3802', 'en': '0.3499', 'zh': '0.0458', 'ko': '0.0448', 'ru': '0.0397'}
2023-01-25 20:43:59,697 | DEBUG | Sample #06: {'en': '0.4349', 'la': '0.4012', 'zh': '0.0590', 'nn': '0.0138', 'ru': '0.0134'}
2023-01-25 20:44:15,148 | DEBUG | Sample #07: {'en': '0.4768', 'nn': '0.1963', 'ko': '0.0678', 'zh': '0.0619', 'ru': '0.0513'}
2023-01-25 20:44:33,486 | DEBUG | Sample #08: {'en': '0.8013', 'nn': '0.0372', 'zh': '0.0243', 'ko': '0.0188', 'ru': '0.0170'}
2023-01-25 20:44:53,860 | DEBUG | Sample #09: {'en': '0.5733', 'nn': '0.1582', 'ko': '0.0505', 'ru': '0.0487', 'zh': '0.0473'}
┌───┬────────────────────────────┬────────┬─────────┬───┬─────────┐
│ # │ Name │ Stream │ Old tag │ → │ New tag │
├───┼────────────────────────────┼────────┼─────────┼───┼─────────┤
│ 1 │ The Dark Valley (2014).mkv │ 1 │ ger │ → │ eng │
└───┴────────────────────────────┴────────┴─────────┴───┴─────────┘
Continue? [y/N]
In both cases the probabilities are (mostly!) very low at around 0.3~0.7.
This is unsurprising for the silent movie (first one), but the second one is interesting...
Note how at some point it's very confident it's German ('de': '0.9772'
).
I think selecting the final language should consider the probability, instead of doing a naive majority vote across all samples.
I'll push some test code later to address this!
@alanorth Try the latest version!
Now it discards low probabilities (while still being fairly tolerant, threshold is 0.8), and additionally, it computes the final score as an average + majority vote to deal with ties.
Algorithm is still quite simple (https://github.com/AlexAltea/curator/commit/5b8c150a4da012d4dcfb32a7f664aba425e00a74), the relevant part was barely 5 lines, but I believe it should fix both issues you encountered!
Ah that's clever! Now curator does the correct thing in both of these cases. First, the sound track with no dialog:
$ curator tag -s audio -t language --only-macrolanguages --log=DEBUG The\ Red\ Turtle\ \(2016\).mkv
2023-01-25 23:01:20,124 | INFO | Processing 1 input media files
2023-01-25 23:01:20,266 | DEBUG | Detecting audio language in stream #1 of media: "The Red Turtle (2016).mkv"
2023-01-25 23:01:22,230 | DEBUG | Sample #00: {'cy': '0.4168', 'en': '0.2776', 'nn': '0.0851', 'zh': '0.0820', 'ja': '0.0171'}
2023-01-25 23:01:24,467 | DEBUG | Sample #01: {'en': '0.6997', 'nn': '0.0995', 'zh': '0.0549', 'ko': '0.0228', 'ru': '0.0158'}
2023-01-25 23:01:26,636 | DEBUG | Sample #02: {'en': '0.5794', 'nn': '0.1794', 'zh': '0.0433', 'ru': '0.0392', 'ko': '0.0220'}
2023-01-25 23:01:29,006 | DEBUG | Sample #03: {'en': '0.7473', 'zh': '0.0670', 'nn': '0.0469', 'ru': '0.0309', 'ko': '0.0192'}
2023-01-25 23:01:31,789 | DEBUG | Sample #04: {'nn': '0.4392', 'en': '0.2651', 'zh': '0.0941', 'ko': '0.0788', 'jw': '0.0201'}
2023-01-25 23:01:34,209 | DEBUG | Sample #05: {'en': '0.6740', 'nn': '0.1084', 'zh': '0.0544', 'ko': '0.0244', 'la': '0.0214'}
2023-01-25 23:01:36,526 | DEBUG | Sample #06: {'en': '0.5547', 'zh': '0.1205', 'la': '0.1107', 'nn': '0.0309', 'ru': '0.0227'}
2023-01-25 23:01:39,387 | DEBUG | Sample #07: {'en': '0.5111', 'ru': '0.1270', 'nn': '0.1121', 'zh': '0.0442', 'ja': '0.0314'}
2023-01-25 23:01:42,051 | DEBUG | Sample #08: {'nn': '0.5731', 'en': '0.2331', 'ko': '0.0357', 'ja': '0.0274', 'zh': '0.0259'}
2023-01-25 23:01:44,769 | DEBUG | Sample #09: {'en': '0.3591', 'zh': '0.2241', 'la': '0.0982', 'nn': '0.0679', 'jw': '0.0383'}
Current plan requires no tasks. There is nothing to be done.
Second, the German one:
$ curator tag -s audio -t language --only-macrolanguages --log=DEBUG The\ Dark\ Valley\ \(2014\).mkv
2023-01-25 22:56:38,434 | INFO | Processing 1 input media files
2023-01-25 22:56:38,605 | DEBUG | Detecting audio language in stream #1 of media: "The Dark Valley (2014).mkv"
2023-01-25 22:56:40,683 | DEBUG | Sample #00: {'en': '0.5387', 'la': '0.1408', 'nn': '0.0857', 'zh': '0.0532', 'ja': '0.0281'}
2023-01-25 22:56:43,441 | DEBUG | Sample #01: {'zh': '0.4106', 'de': '0.3250', 'ko': '0.0568', 'ja': '0.0509', 'ru': '0.0332'}
2023-01-25 22:56:46,331 | DEBUG | Sample #02: {'de': '0.9772', 'nn': '0.0048', 'en': '0.0026', 'nl': '0.0025', 'fr': '0.0023'}
2023-01-25 22:56:49,568 | DEBUG | Sample #03: {'nn': '0.6693', 'en': '0.1933', 'ko': '0.0244', 'haw': '0.0233', 'ja': '0.0154'}
2023-01-25 22:56:53,169 | DEBUG | Sample #04: {'en': '0.3350', 'nn': '0.3216', 'haw': '0.1072', 'zh': '0.0508', 'ko': '0.0368'}
2023-01-25 22:56:56,594 | DEBUG | Sample #05: {'nn': '0.3802', 'en': '0.3499', 'zh': '0.0458', 'ko': '0.0448', 'ru': '0.0397'}
2023-01-25 22:57:00,895 | DEBUG | Sample #06: {'en': '0.4349', 'la': '0.4012', 'zh': '0.0590', 'nn': '0.0138', 'ru': '0.0134'}
2023-01-25 22:57:04,759 | DEBUG | Sample #07: {'en': '0.4768', 'nn': '0.1963', 'ko': '0.0678', 'zh': '0.0619', 'ru': '0.0513'}
2023-01-25 22:57:09,003 | DEBUG | Sample #08: {'en': '0.8013', 'nn': '0.0372', 'zh': '0.0243', 'ko': '0.0188', 'ru': '0.0170'}
2023-01-25 22:57:13,439 | DEBUG | Sample #09: {'en': '0.5733', 'nn': '0.1582', 'ko': '0.0505', 'ru': '0.0487', 'zh': '0.0473'}
┌───┬────────────────────────────┬────────┬─────────┬───┬─────────┐
│ # │ Name │ Stream │ Old tag │ → │ New tag │
├───┼────────────────────────────┼────────┼─────────┼───┼─────────┤
│ 1 │ The Dark Valley (2014).mkv │ 1 │ ger │ → │ deu │
└───┴────────────────────────────┴────────┴─────────┴───┴─────────┘
Continue? [y/N]
I'm not sure how much of this is in your control versus langid's, but I just tried
curator tag -s audio
on a media file that has no spoken dialog and it wanted to tag it as English. :)I'm curious what the score returned by langid was, and why curator decided it was a match for English. Perhaps you could add a debug flag that printed the score and the threshold. Or, how do you think we can make this more accurate?
P.S. It's actually very strange that the original tag is French since there is no dialog (there is a soundtrack, but no talking).