AlibabaResearch / DAMO-ConvAI

DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
MIT License
1.07k stars 175 forks source link

spectra #92

Open huigeStudent opened 7 months ago

huigeStudent commented 7 months ago

你好Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment这篇论文的spectra的相关代码在哪啊?

huybery commented 7 months ago

https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/spectra

huigeStudent commented 7 months ago

你好这里面涉及的pre-processed fine-tuning data and processed pretraining dataset可以提供一下嘛?

publicstaticvo commented 7 months ago

https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/iemocap.tgz https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/mintrec.tgz https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/mosei.tgz https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/mosi.tgz 这些是pre-processed fine-tuning data,都是由pickle组成可以直接用,spokenwoz和预训练的晚一点给

tnlin commented 7 months ago

Thanks @publicstaticvo for the sharing. I will provide some additional information. To access the training, validation, and test files in the datasets, you can use the following command to extract the mosi.tgz file:

tar -xzvf mosi.tgz

Once extracted, you'll find .pkl files for training, validation, and testing. Each pickle file contains a list of samples, and each sample includes the following components:

  1. Audio Features: This field contains the audio feature data.
  2. Text Token IDs: Here, you'll find the IDs corresponding to text tokens.
  3. Label: This is the label assigned to the sample.
  4. History Audio Features (if applicable): If present, this field contains historical audio feature data.
  5. History Text Token IDs (if applicable): Similar to the above, this includes historical text token IDs, if available.

We hope this information helps you in utilizing the dataset effectively. Should you have any questions or need further assistance, please feel free to reach out.

huigeStudent commented 6 months ago

https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/iemocap.tgz https://space-mm-data.oss -cn-wulanchabu.aliyuncs.com/downstreamv2/mintrec.tgz https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/mosei.tgz 这些是预处理的微调数据训练,都是由pickle可以直接组成,spokenwoz和预的晚一点给 https://space-mm-data.oss-cn-wulanchabu.aliyuncs.com/downstreamv2/mosi.tgz

有了嘛 还有这个任务耗时多久啊

tnlin commented 4 months ago

Hi, due to the large data size of SpokenWOZ and Spotify-100k (tens of GBs), please obtain from the original source.

ArmeriaWang commented 3 months ago

Hi, due to the large data size of SpokenWOZ and Spotify-100k (tens of GBs), please obtain from the original source.

hello, unfortunately, the original source of Spotify-100k is no longer available now, could you kindly provide an alternative download link or suggest another way to obtain the dataset? The pretraining data is really essential for us to reproduce your work and go deep into it.

tnlin commented 3 months ago

Hi, we just release our processed pretraining dataset of Spotify (960 hours, 96GB) for reproducibility.

Please visit our repo for detail information.

Cheers

Hi, due to the large data size of SpokenWOZ and Spotify-100k (tens of GBs), please obtain from the original source.

hello, unfortunately, the original source of Spotify-100k is no longer available now, could you kindly provide an alternative download link or suggest another way to obtain the dataset? The pretraining data is really essential for us to reproduce your work and go deep into it.

shenyujie1125 commented 3 months ago

我按照以下参数设置微调--apex_level=1 --batch_size=24 --epochs=5 --grad_acc=1 --show_inner_progress --lr=2e-5 --model_path=/mnt/data/syj/DAMO-ConvAI/spectra/SPECTRA-base --task=mosi --transcripts=/mnt/data/syj/DAMO-ConvAI/spectra/downstreamv2 --text_path=roberta-base --mode=finetune 会出现如下的错误:Traceback (most recent call last): File "/home/a/anaconda3/envs/py3.7/lib/python3.7/contextlib.py", line 130, in exit self.gen.throw(type, value, traceback) File "/home/a/anaconda3/envs/py3.7/lib/python3.7/site-packages/transformers/modeling_utils.py", line 81, in no_init_weights yield File "/home/a/anaconda3/envs/py3.7/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1843, in from_pretrained model = cls(config, *model_args, **model_kwargs) TypeError: init() got an unexpected keyword argument 'cache_dir'

shenyujie1125 commented 3 months ago

想问下这是怎么回事啊,是我的transformers版本不对吗,我的环境都满足的需求文件的要求

l1-l commented 2 months ago

想问下这是怎么回事啊,是我的transformers版本不对吗,我的环境都满足的需求文件的要求

你好,请问你这个问题解决了🏇,我也遇到了相同的问题

shenyujie1125 commented 2 months ago

还没解决,上次失败了,就没有继续了,请问你知道报错的原因了吗,如果可以的话我们可以加个qq交流吗 1807878943

publicstaticvo commented 2 months ago

你的transformers版本是4.18么 不过按理来说应该没问题的,因为我这边4.28也能跑通 我看了整个代码都没找到有这个“cache_dir” 你的traceback是不是没有截取完整,我看不出是代码执行到哪出的问题

l1-l commented 2 months ago

transformers的版本我们换过一些并没有解决问题,然后model = ATForSequenceClassification.from_pretrained(args.model_path)报错位置在这一行,我们发现可能是因为预训练检查点中audio_config.json和text_config.json的"_name_or_path"没有配置正确,因为我们是没有在预训练数据集上从头开始训练,直接运行的微调部分代码,不知道这样是否可行,"_name_or_path"该怎么设置,wavlm和roberta在huggingface里随便下载一个就行了吗

publicstaticvo commented 2 months ago

哦 源代码里面好像是没给直接从一个wavlm和一个roberta直接下游微调的设置,必须要从预训练好的模型开始的 你可以自己试试改改,应该比较简单,参考ATForPretraining的写法就成