elpaco-escience / scikit-talk

Scikit-talk is an open-source toolkit for processing collections of real-world conversational speech in Python. The toolkit aims to facilitate the exploration of large collections of transcriptions and annotations of conversational interaction.
Apache License 2.0
2 stars 0 forks source link

Streamline sample Colab notebook & avoid (suppress?) errors #66

Closed mdingemanse closed 1 week ago

mdingemanse commented 1 week ago

Working through this Colab notebook I noticed its output is not entirely selfexplanatory yet and also it generates some errors that may throw off beginners:

  1. final CSV also includes gaze, should have only speech (this cell should have line 5 uncommented)
  2. parsing EAF generates an error (it works, but looks alarming — can this be suppressed or avoided?) /usr/local/lib/python3.10/dist-packages/pympi/Elan.py:1471: UserWarning: Parsing unknown version of ELAN spec... This could result in errors... warnings.warn('Parsing unknown version of ELAN spec... '
  3. saving corpus locally (this cell) throws an error
    
    ---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in <cell line: 2>() 1 # Save the corpus as a .csv file locally ----> 2 Dutch_corpus.write_csv(path = "Dutch_corpus.csv")

8 frames

/usr/local/lib/python3.10/dist-packages/sktalk/corpus/write/writer.py in (x) 52 norm = pd.jsonnormalize(data=metadata, sep="") 53 df = pd.DataFrame(norm) ---> 54 df[:] = np.vectorize(lambda x: ', '.join( 55 x) if isinstance(x, list) else x)(df) 56 return df

TypeError: sequence item 0: expected str instance, dict found

liesenf commented 1 week ago
  1. final CSV also includes gaze, should have only speech

    • changed the default and only speech tiers are now selected
  2. parsing EAF generates an error (it works, but looks alarming — can this be suppressed or avoided?)

    • the warning message originates from dependency pympi . I will have to check whether it can be suppressed there. Since it's just a warning, I address 3. first.
  3. saving corpus locally throws error

    • function write_csv encounters a TypeError if metadata is provided in metadata fields. Proposed solution ready for review in linked pull request.