Closed jingshuangliu22 closed 1 year ago
We use very similar multimodal dataset (mostly publicly available) to that in our recent work Chinese CLIP https://arxiv.org/abs/2211.01335. Use it for reference. For the plaintext, we use the internal dataset from M6 https://arxiv.org/abs/2103.00823, and thus I advise you to use other plain text datasets of similar sizes as alternatives.
Dear Author, I am trying to pre-train the Chinese OFA model from scratch. Could you please tell me what dataset you used to pre-train the OFA-CN model? Is the Chinese dataset available? Looking forward to your answer. Best wishes.