Closed taichuai closed 4 years ago
thank for pointing out, we will check, @Some-random please ask shuaijiang to solve this
@taichuai can you tell us more about the error you found? Like in which cases the code will fail?
def preprocess_data(self, file_path):
""" Generate a list of tuples (wav_filename, wav_length_ms, transcript speaker)."""
logging.info("Loading data from {}".format(file_path))
with open(file_path, "r", encoding="utf-8") as file:
lines = file.read().splitlines()
# 获取csv表头,wav_filename wav_length_ms transcript speaker
headers = lines[0]
lines = lines[1:]
lines = [line.split("\t") for line in lines]
self.entries = [tuple(line) for line in lines]
self.speakers = []
if "speaker" not in headers.split("\t"):
entries = self.entries
self.entries = []
if self.text_featurizer.model_type == "text":
_, _, all_transcripts = zip(*entries)
self.text_featurizer.load_model(all_transcripts)
for wav_filename, wav_len, transcripts in entries:
self.entries.append(
tuple([wav_filename, wav_len, transcripts, "global"])
)
self.speakers.append("global")
else:
if self.text_featurizer.model_type == "text":
_, _, all_transcripts, _ = zip(*entries)
self.text_featurizer.load_model(all_transcripts)
for _, _, _, speaker in self.entries:
if speaker not in self.speakers:
self.speakers.append(speaker)
', , alltranscripts, = zip(*entries)' is discorrect, because variable "entries" never defined before used in else situation
we have updated in PR#133
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue is closed. You can also re-open it if needed.
the 104th line:" , , alltranscripts, = zip(*entries)" in athena/data/datasets/speech_recongnition.py may cause erro in some cases, so 'entries = self.entries' need be added to avoid it