Tianwei3989 / apolo

MIT License
3 stars 0 forks source link

Question regarding mismatch in artemis_id between apolo.json and artemis_dataset_release_v0.csv #3

Open BetterZH opened 1 week ago

BetterZH commented 1 week ago

I encountered an issue with the 'artemis_id' in both the apolo.json' file and the 'artemis_dataset_release_v0.csv file.

According to the Dataset Preparation instructions, I downloaded the artemis_dataset_release_v0.csv file from https://www.artemisdataset.org/#dataset. However, during steps 8 and 9, I found the following issues:

  1. The total number of data entries in the artemis_dataset_release_v0.csv file is 454,677, but the artemis_id in the apolo/artemis_index/train_index.json file contains IDs that exceed this range, such as 454,682 and 454,681. Could this be a mistake, or is there an explanation for why these IDs are out of range?

    image image
  2. I have added code (highlighted in the red box in the image below) to perform the matching check. I found that nearly 30% of the data (1,728 entries) in the apolo.json file has artemis_id values that do not match their corresponding positions in the artemis_dataset_release_v0.csv file. This discrepancy has me quite confused—could you kindly help confirm if this is expected, or if there may be an issue with the ID assignment?

1f8ead9ba49d3a3804353d760b12cca 9558f168fda874af5c5187d411abab1

I would greatly appreciate any guidance on how to resolve these discrepancies.

Thank you very much for your help!

Tianwei3989 commented 1 week ago

Hi, I checked the apolo.json on my end and did not find the error. The version of artemis_dataset_release_v0.csv of mine contains 454684 samples, and thus the out-of-index error did not occur. image

I also checked the painting names, and they all matched. image

Hummm... would there be some missing samples in the new version of artemis_dataset_release_v0.csv...?

Tianwei3989 commented 1 week ago

Hi, I request a new version of artemis_dataset_release_v0.csv from https://www.artemisdataset.org/#dataset and find that they have exactly the same amount of samples as I had (454684 samples).

The code returns the same result as I got in the last comment.

Can you check the file on your end or request a new version of artemis_dataset_release_v0.csv?

BetterZH commented 1 week ago

Hi, I request a new version of artemis_dataset_release_v0.csv from https://www.artemisdataset.org/#dataset and find that they have exactly the same amount of samples as I had (454684 samples).

The code returns the same result as I got in the last comment.

Can you check the file on your end or request a new version of artemis_dataset_release_v0.csv?

Thank you for your reply! I would like to confirm, did you apply for the zip dataset by filling out this form?

BetterZH commented 1 week ago

The issue has been resolved, and the data is indeed correct. I had previously opened the file in Excel, and sometimes opening a CSV file in Excel can cause automatic adjustments to encoding or formatting, which can affect the accuracy of the data. Thank you very much for your help!