christopheparisse / trjs

Transcription tools with most features of Transcriber and Clan but written using javascript and html5
GNU General Public License v3.0
1 stars 1 forks source link

inserts UNK tier #4

Open ebergelson opened 5 years ago

ebergelson commented 5 years ago

In at least one case we're finding that trjs has inserted a UNK tier after a @bg/@eg conversation. There is no UNK tier in the .CHA file (these are files that are from lena daylong recordings, converted from their native .its format to lena format using lena2chat function in clan)

(we can provide screenshots and further details if needed.)

christopheparisse commented 5 years ago

Hello The easiest would be send me an example CHAT file that creates the problem. Then correcting this would be simple. You can send it to me at this address: cparisse@parisnanterre.fr mailto:cparisse@parisnanterre.fr Best

On 4 Oct 2018, at 17:06, Elika Bergelson notifications@github.com wrote:

In at least one case we're finding that trjs has inserted a UNK tier after a @bg/@eg https://github.com/eg conversation. There is no UNK tier in the .CHA file (these are files that are from lena daylong recordings, converted from their native .its format to lena format using lena2chat function in clan)

(we can provide screenshots and further details if needed.)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/christopheparisse/trjs/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/ARXXJ1eESe5UF_57QCQvxC53n3LS27fhks5uhiQEgaJpZM4XIUFk.

ebergelson commented 5 years ago

Done--let us know how it works for you!

christopheparisse commented 5 years ago

I’ve been looking at the chat files and using the automatic conversion (in fact it’s the conversion done by this tool: http://ct3.ortolang.fr/tei-corpo/ ). The problem comes from the CHAT files which contains a pattern that I didn’t even know that it could be used! So I need your input to create a solution that corresponds to what was your intention for the transcription. In the chat files, you have: *SIL: 0 . %xdb: average_dB="-70.21" peak_dB="-39.52" @Eg: Pause 1129 %xcom: silence 2 of 2 ends at 57600140.0 -- previous timestamp adjusted: was 57599990

The problem is the following: Who/What (which participant) does the %xcom belong to ? When I see the content of your bullets in CLAN, I see this: *SIL: 0 . •55605460_57599990• %xdb: average_dB="-70.21" peak_dB="-39.52" @Eg: Pause 1129 %xcom: silence 2 of 2 ends at 57600140.0 -- previous timestamp adjusted: was 57599990

So for me the %xcom should be a part of SIL. Is this the case ? For me the correct syntax would have been: SIL: 0 . •55605460_57599990• %xdb: average_dB="-70.21" peak_dB="-39.52" %xcom: silence 2 of 2 ends at 57600140.0 -- previous timestamp adjusted: was 57599990 @Eg: Pause 1129

Because the EG is an end of bloc, it should not split in two the *SIL information.

So I can change the code of the conversion from CHAT to TEI so that UNK is replaced by the last speaker, but I need to know what I should do, as the CHAT file are ambiguous. Or you could edit the CHAT files, so that %xcom is clearly linked to *SIL