Data sample size - Githubissues

As we mentioned in the paper, the dataset has been augmented by GPT. And, the current release of the dataset is the pre-augmentation version, we recommend that you use this original version. The complete data still needs to be organised.

Also, if you need more data, you can augment the text and images by yourself using LLM, etc. The data in this paper was augmented two years ago by an earlier version of the GPT, and the current technology allows for significantly higher quality data augmentation.

Thank you.

fesvhtr / DocMSU

Data sample size #2