Question about spectra's preprocessed data

jfightyr commented 9 months ago

Dear Author: Thank you for your excellent work! I am very interested in spectra, could you please provide the pre-processed fine-tuning data and processed pretraining dataset? Thank you very much and I wish you all the best!

publicstaticvo commented 8 months ago

https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/iemocap.tgz https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/mintrec.tgz https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/mosei.tgz https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/mosi.tgz Here are the processed pretraining datasets. The SpokenWoz dataset will be released later.

tnlin commented 8 months ago

Thanks @publicstaticvo for the sharing. I will provide some additional information. To access the training, validation, and test files in the datasets, you can use the following command to extract the mosi.tgz file:

tar -xzvf mosi.tgz

Once extracted, you'll find .pkl files for training, validation, and testing. Each pickle file contains a list of samples, and each sample includes the following components:

Audio Features: This field contains the audio feature data.
Text Token IDs: Here, you'll find the IDs corresponding to text tokens.
Label: This is the label assigned to the sample.
History Audio Features (if applicable): If present, this field contains historical audio feature data.
History Text Token IDs (if applicable): Similar to the above, this includes historical text token IDs, if available.

We hope this information helps you in utilizing the dataset effectively. Should you have any questions or need further assistance, please feel free to reach out.

jfightyr commented 5 months ago

Thank you very much, I understand. Thank you again for your excellent work!

AlibabaResearch / DAMO-ConvAI

Question about spectra's preprocessed data #81