dopefishh / pympi

A python module for processing ELAN and Praat annotation files
MIT License
93 stars 39 forks source link

eaf_from_chat hang #45

Open selmling opened 2 years ago

selmling commented 2 years ago

I'd like to be able to batch convert .cha files to .eaf format using your wonderful library, pympi. I've used pympi for other purposes with great success, but I'm having trouble getting it to interact with .cha files.

When I call the pympi.Elan.eaf_from_chat function, it hangs on the line where it checks the utf8 codec and continues.

In your documentation, you mention using older codecs for older files -- any help on how to track down the codec if that information isn't readily available? This may help me debug eaf_from_chat. The chat files I'm working with do have @UTF8 on line 1.

Also, have you considered opening up a gitter forum for your library? That would be a helpful place for folks to share code, generally easing the learning curve of using pympi, which is a really great tool!

Thank you for your work on this library! -Steven

dopefishh commented 2 years ago

Hi Steven, I think I meant windows iso charset at the time (https://nl.wikipedia.org/wiki/ISO_8859-1). Alternatively you can check whether file is able to determine the codec. If so, you can also use iconv to convert the codec before processing the file. However, if CHAT added the UTF8 header it is strange that it is not UTF8. To be honest, I wrote this by need and after I was done importing the CHAT files I never looked at it again.

I haven't heard of gitter. I'm fine with opening it if there is enthusiasm.

jackft commented 1 year ago

@selmling this might have to do with the python version you are using. I find this function useful as well, but have needed to port it from python 2 to python 3, since it appears to be written for python 2 when the language used bytestrings. I would be happy to make a pull request with my solution @dopefishh.

dopefishh commented 1 year ago

Thanks, pull requests are always very much welcome.