fd873630 / RNN-Transducer

RNN-Transducer for korean
38 stars 3 forks source link

vocab_size 관련 질문 #1

Open hccho2 opened 2 years ago

hccho2 commented 2 years ago
#id\char 
0   _
1    
2   ㄱ
...
52   ㅄ
53   <s>
54   </s>

0부터 54까지 모두 55개인데, yaml파일의 vocab_size는 왜 54로 되어 있나요?

fd873630 commented 2 years ago

제 코드에 대해서 관심 가져주셔서 감사합니다.

일단 54개로 설정한 것은 실수인 것 같습니다.

제가 과거에 space를 없애고 cer결과를 뽑는 실험을 했는데 그때 수정했어야 하는데 꼼꼼히 확인하지 못했습니다.

아마도 54개로 학습을 진행하셔도 RNN-T만 학습 하신다면 sos token과 eos token이 나오지 않아서 에러가 생기지 않을 것으로 예상됩니다.

감사합니다.