athena-team / athena

an open-source implementation of sequence-to-sequence based speech processing engine
https://athena-team.readthedocs.io
Apache License 2.0
952 stars 196 forks source link

zip(*entries) #130

Closed taichuai closed 4 years ago

taichuai commented 4 years ago

the 104th line:" , , alltranscripts, = zip(*entries)" in athena/data/datasets/speech_recongnition.py may cause erro in some cases, so 'entries = self.entries' need be added to avoid it

tjadamlee commented 4 years ago

thank for pointing out, we will check, @Some-random please ask shuaijiang to solve this

Some-random commented 4 years ago

@taichuai can you tell us more about the error you found? Like in which cases the code will fail?

taichuai commented 4 years ago
def preprocess_data(self, file_path):
    """ Generate a list of tuples (wav_filename, wav_length_ms, transcript speaker)."""
    logging.info("Loading data from {}".format(file_path))
    with open(file_path, "r", encoding="utf-8") as file:
        lines = file.read().splitlines()
    # 获取csv表头,wav_filename    wav_length_ms   transcript  speaker
    headers = lines[0]
    lines = lines[1:]
    lines = [line.split("\t") for line in lines]
    self.entries = [tuple(line) for line in lines]
    self.speakers = []
    if "speaker" not in headers.split("\t"):
        entries = self.entries
        self.entries = []
        if self.text_featurizer.model_type == "text":
            _, _, all_transcripts = zip(*entries)
            self.text_featurizer.load_model(all_transcripts)
        for wav_filename, wav_len, transcripts in entries:
            self.entries.append(
                tuple([wav_filename, wav_len, transcripts, "global"])
            )
        self.speakers.append("global")
    else:
        if self.text_featurizer.model_type == "text":
            _, _, all_transcripts, _ = zip(*entries)
            self.text_featurizer.load_model(all_transcripts)
        for _, _, _, speaker in self.entries:
            if speaker not in self.speakers:
                self.speakers.append(speaker)

', , alltranscripts, = zip(*entries)' is discorrect, because variable "entries" never defined before used in else situation

tjadamlee commented 4 years ago

we have updated in PR#133

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue is closed. You can also re-open it if needed.