ELITR / online-text-flow

Online event streaming to improve data and text flows
MIT License
1 stars 0 forks source link

Missing segments after out-of-order messages #17

Closed srdecny closed 3 years ago

srdecny commented 3 years ago

During the Antre session, we've noticed the online-text-flow is missing some segments (see the images). I'm also attaching the logs of the data sent to the online text flow. Do you know what could cause this issue? image_from_ios image broken.log

Gldkslfmsd commented 3 years ago

6300 and 6400 never come with 100 status, so they're skipped. What is a source of this broken.log and which commands preceded in the pipeline?

My initial guesses: 1) ASR segments broke the expected protocol, 2) something lost in network connection, 3) wrong combination of brief and non-brief format.

otakar-smrz commented 3 years ago

If you have the data before events.py, that would be helpful. I also suspect the problem is with the brief format, in the non-brief, numbering is more monotonic and I have never seen similar types of gaps there. Notice though, there is no such issue with the source EN stream, is it? Can the translation be breaking the sentence numbering and repetition, maybe due to caching?

srdecny commented 3 years ago

The source of the broken.log is the pipeliner and it should be the traffic between the component that splits the RainbowMT packets and online-text-flow. @pyRis -- is that correct?

It is possible the segments were never finalized, ASR had issues because the spoken language was changed frequently, I believe Rishu knows more.

I'm attaching more logs, they are:

srdecny commented 3 years ago

In case it turns out to be the ASR getting confused and not confirming segments, how should we proceed? The current behavior isn't ideal, because there are chunks of the transcript missing. Perhaps the "abandoned" segments should be displayed forever, but grayed out? That way, the user sees something and there won't be confusing holes in the transcript.

Gldkslfmsd commented 3 years ago

There's the INTERNAL Bug:

[2021-03-17 09:18:25] 6100 6101 INTERNAL BUG: Number of translated batches is lower than expected. index i=1, len(trans_batch)=1

And I see you're using --unsafe mt-wrapper flag. I think it's the reason. The rainbow worker should be fixed at UEDIN.

otakar-smrz commented 3 years ago

Dominik, please take over, I cannot address this. Thanks!

pyRis commented 3 years ago

Hi,

This bug was for when I tried testing the KIT standalone MT workers. This does not explain the Czech output behaviour that we observed.

My best guess and from what I observed on the webpage while listening to the audio is that since the language was getting changed quite frequently i.e. between English and Czech, the Segmenter/OTF never finalized the sentence thus impacting the translation. As you may see, the translations did happen, but the RB worker received new lines before it got a complete line for the previous sentence which resulted in discarding the previously received incomplete sentences.

I'm sorry for responding such late, I was in a very bad traffic jam in which it took me around ~4 hours (7PM to 11PM in my local time zone) to travel a mere 5.2 KM. The administration is imposing new micro-containment zones along the National Highway since a significant number of cases were reported today in my hometown which resulted in this mess.

Best, Rishu


From: Dominik Macháček @.> Sent: Wednesday, March 17, 2021 11:10 PM To: ELITR/online-text-flow @.> Cc: Rishu Kumar @.>; Mention @.> Subject: Re: [ELITR/online-text-flow] Missing segments after out-of-order messages (#17)

There's the INTERNAL Bug:

[2021-03-17 09:18:25] 6100 6101 INTERNAL BUG: Number of translated batches is lower than expected. index i=1, len(trans_batch)=1

And I see you're using --unsafe mt-wrapper flag. I think it's the reason. The rainbow worker should be fixed at UEDIN.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FELITR%2Fonline-text-flow%2Fissues%2F17%23issuecomment-801278522&data=04%7C01%7C%7C230fc99f98294f22dd0a08d8e96bc1a9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637515996298151785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BQY6ZPsU%2B11V9aYEYHCvEuj3cgx6BSarXNdjNiGJTHY%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACIYEXFI6U6DZ2OJP72SEZTTEDSQZANCNFSM4ZK46MSA&data=04%7C01%7C%7C230fc99f98294f22dd0a08d8e96bc1a9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637515996298161748%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F2v5Yiae87gYMpXn%2FpRKQiqhuaERwKrg6HLIAgSxQhM%3D&reserved=0.

Gldkslfmsd commented 3 years ago

@srdecny , can you please add log of traffic between rainbow splitter and 1) cs otf client, 2) bg otf client? I tried to parse rainbow on my own, sent it by client to server, and it was ok.

srdecny commented 3 years ago

l_04-45-bg2sentences.log l_04-43-cs2sentences.log Sure, here you go. If you need more logs for the diagnostics, let me know and I'll make you an account on the servers the logs are on.

Gldkslfmsd commented 3 years ago

So, the bug must be in rainbow splitter (or, between rainbow mt-wrapper and otf client).

In 03-04-rainbow-targets2rainbow_packet.log (stripped for readablity):

[2021-03-17 09:18:37] 6400 6401 bg  Бих искал също да се възползвам от тази възможност и да спомена...
[2021-03-17 09:18:39] 6400 6401 bg  Бих искал също да се възползвам от тази възможност и да спомена, ч
[2021-03-17 09:18:40] 6400 6410 bg  Бих искал също да се възползвам от тази възможност. hr  Želim isko
[2021-03-17 09:18:40] 6500 6501 bg  И споменахме, че сменяме името си в домашни любимци...  hr  I spom
[2021-03-17 09:18:42] 6400 6401 bg  Бих искал също да се възползвам от тази възможност и да спомена,
[2021-03-17 09:18:43] 6400 6401 bg  Бих искал също да се възползвам от тази възможност и да спомена,
[2021-03-17 09:18:44] 6300 6400 bg  И аз съм координатор на комуникацията в европейската пени интерна
[2021-03-17 09:18:45] 6400 6410 bg  Бих искал също да се възползвам от тази възможност и да спомена,
[2021-03-17 09:18:45] 6500 6501 bg  Бих...  hr  Ja bih...   cs  Já bych...  da  Jeg ville...    nl  I
[2021-03-17 09:18:46] 6500 6510 bg  Бих искал.  hr  Volio bih.  cs  Chtěl bych. da  Jeg vil gerne.  n
[2021-03-17 09:18:46] 6600 6601 bg  Благодаря ви за поканата да бъдете...   hr  Hvala vam na pozivu d
[2021-03-17 09:18:48] 6600 6601 bg  Благодаря за поканата да бъдеш тук. hr  Hvala vam na pozivu da bu
[2021-03-17 09:18:49] 6600 6610 bg  Благодаря ви за поканата да бъдете тук с вас на този 26.    hr  H

l_04-45-bg2sentences.log :

[2021-03-17 09:18:39] 6400 6401 Бих искал също да се възползвам от тази възможност и да спомена, че сменяме...
[2021-03-17 09:18:40] 6400 6410 Бих искал също да се възползвам от тази възможност.
[2021-03-17 09:18:45] 6500 6501 Бих...
[2021-03-17 09:18:46] 6500 6510 Бих искал.
[2021-03-17 09:18:46] 6600 6601 Благодаря ви за поканата да бъдете...
[2021-03-17 09:18:48] 6600 6601 Благодаря за поканата да бъдеш тук.
[2021-03-17 09:18:49] 6600 6610 Благодаря ви за поканата да бъдете тук с вас на този 26.

There's nothing between :40 and :45. If the update 6300 6400 would be there, then it would appear in the paragraph view.

Gldkslfmsd commented 3 years ago

So it's not a problem of otf. I'm passing it to cruise-control and @srdecny .

srdecny commented 3 years ago

Sure, thanks for your analysis!