Question about the Close-set splits

zhoumu53 commented 7 months ago

Hi, thank you for your great work on collecting the datasets and providing the toolkits. I have some questions on the results on the paper. 1) about the results on Figure 5 -- did you use the default close-set split for the results for all the datasets?

splitter = splits.ClosedSetSplit(0.8)
for idx_train, idx_test in splitter.split(df):
    splits.analyze_split(df, idx_train, idx_test)

2) about the evaluation metrics -- as mentioned paper, if I want to reproduce the results on accuracy, can I use the same code to evaluate the model performance using the code here?

y_true = df_test['identity']
metrics.accuracy(y_true, y_pred)

Thank you in advance!

sadda commented 7 months ago

Hi, more or less, yes. Probably the simplest thing would be to load directly the used splits. In the folder datasets, there is a split for each dataset (take care that some individuals are unknown and are neither in the training nor testing set), in the folder combined there is a combination of all splits.

zhoumu53 commented 7 months ago

Thank you so much!

WildlifeDatasets / wildlife-datasets

Question about the Close-set splits #4