Open YiguoHe opened 3 months ago
You can calculate the distance between two images by hash values if there are duplicates in two datasets. If the distance is less than a certain threshold, it is defined as a duplicate image. It is recommended to manually check the deduplicated images in the code to avoid filtering out some images that are not actually duplicates.
You can calculate the distance between two images by hash values if there are duplicates in two datasets. If the distance is less than a certain threshold, it is defined as a duplicate image. It is recommended to manually check the deduplicated images in the code to avoid filtering out some images that are not actually duplicates.
Thank you for your response. Your work is excellent. Best wishes!
The RSITMD and RSICD datasets have a data leakage issue where they might share some common images and descriptions. how to deal with it properly?