acdh-oeaw / shawi-data

Data of the project "The Shawi-type Arabic dialects (FWF P 33574)".
Other
0 stars 1 forks source link

split utterances with several speaker #23

Open dasch124 opened 5 months ago

dasch124 commented 5 months ago

In several texts (e.g. Urfa-107_Cotton_Business) one ELAN segment contains utterances of several speakers. It would be good to separate those:

We can then transform this into @who attributes.

If the original context should be restored, curators can afterwards add <annotationBlock> elements around the separated <u> elements after tokenization.

miriamaltawil commented 5 months ago

Dear Daniel and Veronika, I tried to separate all the segments that contained different speakers and I added the speakers ID. I pushed the elan file on github. I hope it is fine now. The problems occurs when there are two people speaking at the same time and the voices overlap. In that case (I think it happens twice), I could not separate the segments and for the moment I left them together, even though this is also not a good solution. Before moving on solving this issue, I would first like to know if what I did until now looks good for you.