Closed srdecny closed 3 years ago
6300 and 6400 never come with 100 status, so they're skipped. What is a source of this broken.log and which commands preceded in the pipeline?
My initial guesses: 1) ASR segments broke the expected protocol, 2) something lost in network connection, 3) wrong combination of brief and non-brief format.
If you have the data before events.py, that would be helpful. I also suspect the problem is with the brief format, in the non-brief, numbering is more monotonic and I have never seen similar types of gaps there. Notice though, there is no such issue with the source EN stream, is it? Can the translation be breaking the sentence numbering and repetition, maybe due to caching?
The source of the broken.log
is the pipeliner and it should be the traffic between the component that splits the RainbowMT packets and online-text-flow. @pyRis -- is that correct?
It is possible the segments were never finalized, ASR had issues because the spoken language was changed frequently, I believe Rishu knows more.
I'm attaching more logs, they are:
In case it turns out to be the ASR getting confused and not confirming segments, how should we proceed? The current behavior isn't ideal, because there are chunks of the transcript missing. Perhaps the "abandoned" segments should be displayed forever, but grayed out? That way, the user sees something and there won't be confusing holes in the transcript.
There's the INTERNAL Bug:
[2021-03-17 09:18:25] 6100 6101 INTERNAL BUG: Number of translated batches is lower than expected. index i=1, len(trans_batch)=1
And I see you're using --unsafe mt-wrapper flag. I think it's the reason. The rainbow worker should be fixed at UEDIN.
Dominik, please take over, I cannot address this. Thanks!
Hi,
This bug was for when I tried testing the KIT standalone MT workers. This does not explain the Czech output behaviour that we observed.
My best guess and from what I observed on the webpage while listening to the audio is that since the language was getting changed quite frequently i.e. between English and Czech, the Segmenter/OTF never finalized the sentence thus impacting the translation. As you may see, the translations did happen, but the RB worker received new lines before it got a complete line for the previous sentence which resulted in discarding the previously received incomplete sentences.
I'm sorry for responding such late, I was in a very bad traffic jam in which it took me around ~4 hours (7PM to 11PM in my local time zone) to travel a mere 5.2 KM. The administration is imposing new micro-containment zones along the National Highway since a significant number of cases were reported today in my hometown which resulted in this mess.
Best, Rishu
From: Dominik Macháček @.> Sent: Wednesday, March 17, 2021 11:10 PM To: ELITR/online-text-flow @.> Cc: Rishu Kumar @.>; Mention @.> Subject: Re: [ELITR/online-text-flow] Missing segments after out-of-order messages (#17)
There's the INTERNAL Bug:
[2021-03-17 09:18:25] 6100 6101 INTERNAL BUG: Number of translated batches is lower than expected. index i=1, len(trans_batch)=1
And I see you're using --unsafe mt-wrapper flag. I think it's the reason. The rainbow worker should be fixed at UEDIN.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FELITR%2Fonline-text-flow%2Fissues%2F17%23issuecomment-801278522&data=04%7C01%7C%7C230fc99f98294f22dd0a08d8e96bc1a9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637515996298151785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BQY6ZPsU%2B11V9aYEYHCvEuj3cgx6BSarXNdjNiGJTHY%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACIYEXFI6U6DZ2OJP72SEZTTEDSQZANCNFSM4ZK46MSA&data=04%7C01%7C%7C230fc99f98294f22dd0a08d8e96bc1a9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637515996298161748%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F2v5Yiae87gYMpXn%2FpRKQiqhuaERwKrg6HLIAgSxQhM%3D&reserved=0.
@srdecny , can you please add log of traffic between rainbow splitter and 1) cs otf client, 2) bg otf client? I tried to parse rainbow on my own, sent it by client to server, and it was ok.
l_04-45-bg2sentences.log l_04-43-cs2sentences.log Sure, here you go. If you need more logs for the diagnostics, let me know and I'll make you an account on the servers the logs are on.
So, the bug must be in rainbow splitter (or, between rainbow mt-wrapper and otf client).
In 03-04-rainbow-targets2rainbow_packet.log (stripped for readablity):
[2021-03-17 09:18:37] 6400 6401 bg Бих искал също да се възползвам от тази възможност и да спомена...
[2021-03-17 09:18:39] 6400 6401 bg Бих искал също да се възползвам от тази възможност и да спомена, ч
[2021-03-17 09:18:40] 6400 6410 bg Бих искал също да се възползвам от тази възможност. hr Želim isko
[2021-03-17 09:18:40] 6500 6501 bg И споменахме, че сменяме името си в домашни любимци... hr I spom
[2021-03-17 09:18:42] 6400 6401 bg Бих искал също да се възползвам от тази възможност и да спомена,
[2021-03-17 09:18:43] 6400 6401 bg Бих искал също да се възползвам от тази възможност и да спомена,
[2021-03-17 09:18:44] 6300 6400 bg И аз съм координатор на комуникацията в европейската пени интерна
[2021-03-17 09:18:45] 6400 6410 bg Бих искал също да се възползвам от тази възможност и да спомена,
[2021-03-17 09:18:45] 6500 6501 bg Бих... hr Ja bih... cs Já bych... da Jeg ville... nl I
[2021-03-17 09:18:46] 6500 6510 bg Бих искал. hr Volio bih. cs Chtěl bych. da Jeg vil gerne. n
[2021-03-17 09:18:46] 6600 6601 bg Благодаря ви за поканата да бъдете... hr Hvala vam na pozivu d
[2021-03-17 09:18:48] 6600 6601 bg Благодаря за поканата да бъдеш тук. hr Hvala vam na pozivu da bu
[2021-03-17 09:18:49] 6600 6610 bg Благодаря ви за поканата да бъдете тук с вас на този 26. hr H
l_04-45-bg2sentences.log :
[2021-03-17 09:18:39] 6400 6401 Бих искал също да се възползвам от тази възможност и да спомена, че сменяме...
[2021-03-17 09:18:40] 6400 6410 Бих искал също да се възползвам от тази възможност.
[2021-03-17 09:18:45] 6500 6501 Бих...
[2021-03-17 09:18:46] 6500 6510 Бих искал.
[2021-03-17 09:18:46] 6600 6601 Благодаря ви за поканата да бъдете...
[2021-03-17 09:18:48] 6600 6601 Благодаря за поканата да бъдеш тук.
[2021-03-17 09:18:49] 6600 6610 Благодаря ви за поканата да бъдете тук с вас на този 26.
There's nothing between :40 and :45. If the update 6300 6400 would be there, then it would appear in the paragraph view.
So it's not a problem of otf. I'm passing it to cruise-control and @srdecny .
Sure, thanks for your analysis!
During the Antre session, we've noticed the online-text-flow is missing some segments (see the images). I'm also attaching the logs of the data sent to the online text flow. Do you know what could cause this issue? broken.log