Closed gitgaviny closed 2 years ago
Or could you please tell me how to save features of the hidden states (like torchvision.models._utils.IntermediateLayerGetter did) so that I can use them as the input of another costum layer. Directly use torch.save can only processed batch to batch, is there any api can help me with that? I tried to load the pre-trained model (pytorch_model.bin) to my own model and train without transformer.trainer, so that I can use costum input, but the model tend to classify all inputs (IEMOCAP using raw waveform) into one class. So I'm interested how to combine this model with my own dataset class.
Hello @gavinyuan1 , thanks for your interest.
Thank you for your reply! I will close this issue.
Dear @TideDancer ,
Thank you very much for providing the program of interspeech paper!
I'm a bignner of hugging face training api, and I find it difficult to control the input of the model. For example, if I want an additional branch of feature extractor with MFCC as input rather than raw wavform (or using mannual trasncription or video input as multimodal system), is that possible to train it together with this wav2vec2 model? Since the length of labels are consistant, we can simply use [:-1] or [:-2] to build multi-task learning model. But this is not applicable for varible length input features. Hope to get your reply.