Requests for clarifications on fsc and snips

CoraJung commented 3 years ago

We are a group of NYU MS in Data Science students who are working on developing an end-to-end speech-to-intent model. We have read your paper and replicated your code and would love to ask you some questions.

Paper vs. Github Results Discrepancy We notice that the final test accuracies for both FSC and SNIPS are different in your paper (ie. 97.65% for FSC, 73.49% for SNIPS) and the github repo (ie. 95.65% for FSC, 69.88% for SNIPS). Can you share some thoughts on the difference between the number in the paper and git repo?

SNIPS Data Partition Ambiguity In prepare_snips.py, we notice that you split complete.csv into train-val-test. However, since we don’t have this complete.csv that you used, we can’t replicate the exact same partitions. Our results from running your code on our SNIPS dataset using our own splits are significantly higher on average: we ran 4 times (each time using our own splits of shuffled complete.csv), and average accuracy is 81.17% though we use the same environment mentioned in your git repo. We’d love to double check with you on these points.

Which subsets of the SNIPS dataset did you use to create the complete.csv? Our guess is that you used smartLight close-field and far-field (3320 observations) for your experiments (ie. are these the data listed in your complete.csv). Please let us know if that’s incorrect.
Would you mind sharing your complete.csv and intents.json for SNIPS with us? We believe having the input data in the same format/split is important to draw a fair comparison between yours and our future work.

BERT Embeddings Fine-tuned or Not.
Section 2.1 of your paper says “we back-propagate the embedding and SLU task losses only to the acoustic branch” because you think fine-tuning BERT will lead to overfitting. From this line, our understanding was that the BERT embeddings would be frozen. However, we’ve noticed this piece in the code where the parameters were passed into the Adam optimizer with the learning rate 2e-5 (line 63 in experiment_triplet.py), implying that BERT embeddings would be fine-tuned. self.optimizer = torch.optim.Adam([ {'params': self.model.bert.parameters(), 'lr':args.learning_rate_bert}, {'params': self.model.speech_encoder.parameters()}, {'params': self.model.classifier.parameters()} ], lr=args.learning_rate)

We would appreciate it if you can give us clarification on whether BERT is fine-tuned and, if so, the reason you chose to fine-tune BERT. Furthermore, in the case where BERT’s parameters are not frozen, could you share some thoughts on fine-tuning BERT for 20 epochs (default in the code), which may lead to overfitting and hurting the text embeddings? As mentioned in other papers about BERT, the typical number of epochs for fine-tuning BERT is 5 at max.

zhangyanbo2007 commented 3 years ago

great!

viswavi commented 3 years ago

Hi Cora, I'm on a team from CMU and we're also trying to reproduce the results from the Tie Your Embeddings Down paper. Were you able to get the FSC model trained, matching the results reported in the README?

My teammate @ostapen tried this but found the model isn't learning, even after hours and hours of training. To avoid hijacking your thread, you can alternatively email me at the email address listed on my website, and then I can start a thread with my other teammates.

zhengqing187 commented 1 year ago

Hi, do you get the complete.csv file? I meet the same problem as you.Could you please share your Snips dataset?

zhengqing187 commented 1 year ago

great! 你好，你下载的snips数据集里有。.csv文件吗？

alexa / alexa-end-to-end-slu

Requests for clarifications on fsc and snips #1