elpaco-escience / scikit-talk

Scikit-talk is an open-source toolkit for processing collections of real-world conversational speech in Python. The toolkit aims to facilitate the exploration of large collections of transcriptions and annotations of conversational interaction.
Apache License 2.0
2 stars 0 forks source link

Add a json parser/reader #39

Closed bvreede closed 9 months ago

bvreede commented 10 months ago

Goals:

In order to make the review easier we recommend the following workflow.

  1. Import cha and write both a conversation and corpus to json. (Download a .cha fil e.g. here)
    
    import sktalk

conversation_cha = sktalk.ChaFile('path/to/conversation_file.cha').parse() conversation_cha.write_json(path = "path/to/conversation_file.json")

corpus_obj = sktalk.Corpus(name = "Example Corpus from Griffith Corpus of Spoken Australian English", url = "https://ca.talkbank.org/data-orig/GCSAusE/") corpus_obj.append(conversation_cha)

corpus_obj.write_json(path = "path/to/corpus_file.json")


2. Here you import the json files as Corpus and Conversation objects. This is the functionality that has been added. 

corpus_obj_from_json = Corpus.from_json("path/to/corpus_file.json") conversation_obj_from_json = Conversation.from_json("path/to/conversation_file.json")


3. You can verify them by checking for instance the metadata.

corpus_obj_from_json.metadata conversation_obj_from_json.metadata



Closes #34
Should also address #41 
sonarcloud[bot] commented 9 months ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication