Closed John-Ge closed 1 year ago
Hi @John-Ge, thanks for your interest in this repo. For detailed description, the dataset contains around 106K instances, where around 2K instances are filtered out for better quality. For example, some answers generated by GPT-4 may tell that the information is based on the given "captions" and "descriptions". We exclude those instances from the result.
Thank you for your awosome work. I notice that your svit descriptions do not cover all images in vg. Do you filter out these images for some reason(low quality e.g.)? Or just not include them? Thank you!