I don't understand why you are enumerating over correctness_.
I might misunderstand something, but I think you should iterate over all the guids instead.
Otherwise, you cannot dump the statistics of the entire training set as guid in this loop only has 1 + Epoch possible values.
df = pd.DataFrame([[guid,
i,
threshold_closeness_[guid],
confidence_[guid],
variability_[guid],
correctness_[guid],
forgetfulness_[guid],
] for i, guid in enumerate(correctness_)], columns=column_names)
correctness_ is a dictionary mapping each guid to the correctness metric values (a list of size 1+ Epoch), so in effect this loop does iterate over all the guids. Hope this clears the confusion!
Hi,
Thanks for the nice work!
I have a question about L149: https://github.com/allenai/cartography/blob/c7865383e421a91611c2f4e79d1ffbfb7850f4f4/cartography/selection/train_dy_filtering.py#L149
I don't understand why you are enumerating over
correctness_
. I might misunderstand something, but I think you should iterate overall the guids
instead. Otherwise, you cannot dump the statistics of the entire training set asguid
in this loop only has 1 + Epoch possible values.Thank you!