Question about data preparation with speech data alignment in speech matrix dataset

During data preparation of speech matrix, for aligned_speech tsv files, the files shown as:

score   lt_audio    sl_audio
1.121542    lt_aud.zip:5203779181:21527 sl_aud.zip:50446110544:22547
1.1027344   lt_aud.zip:132563238:14033  sl_aud.zip:3224296345:11940
1.1023445   lt_aud.zip:6292033729:49818 sl_aud.zip:17374011756:20890

which have different formats with the audio titles in raw audio folders for each language, for example in the folder audios/lt/, there is:

ls | head -n 5
20090112-0900-PLENARY-10_lt_1079616_1086270.ogg
20090112-0900-PLENARY-10_lt_1133568_1136670.ogg
20090112-0900-PLENARY-10_lt_1238304_1242270.ogg
20090112-0900-PLENARY-10_lt_1288704_1292862.ogg
20090112-0900-PLENARY-10_lt_1288704_1296606.ogg

So how do these two formats align with each other? I thought they could somehow be the same number pairs, but there are actually not.

Could anybody help? Thank you so much!

facebookresearch / fairseq

Question about data preparation with speech data alignment in speech matrix dataset #5466