Scikit-talk is an open-source toolkit for processing collections of real-world conversational speech in Python. The toolkit aims to facilitate the exploration of large collections of transcriptions and annotations of conversational interaction.
corpus_obj = sktalk.Corpus(name = "Example Corpus from Griffith Corpus of Spoken Australian English",
url = "https://ca.talkbank.org/data-orig/GCSAusE/")
corpus_obj.append(conversation_cha)
Goals:
Conversation
Corpus
In order to make the review easier we recommend the following workflow.
conversation_cha = sktalk.ChaFile('path/to/conversation_file.cha').parse() conversation_cha.write_json(path = "path/to/conversation_file.json")
corpus_obj = sktalk.Corpus(name = "Example Corpus from Griffith Corpus of Spoken Australian English", url = "https://ca.talkbank.org/data-orig/GCSAusE/") corpus_obj.append(conversation_cha)
corpus_obj.write_json(path = "path/to/corpus_file.json")
corpus_obj_from_json = Corpus.from_json("path/to/corpus_file.json") conversation_obj_from_json = Conversation.from_json("path/to/conversation_file.json")
corpus_obj_from_json.metadata conversation_obj_from_json.metadata