Optimization-AI / LibAUC

LibAUC: A Deep Learning Library for X-Risk Optimization
https://libauc.org/
MIT License
285 stars 38 forks source link

Questions on AUPRC example #14

Closed berkuva closed 10 months ago

berkuva commented 2 years ago

What does the parameter index_s for APLoss_SH indicate?

Does posNum indicate the number of positive samples in each batch, training data, or all data?

yzhuoning commented 2 years ago

In version 1.1.8, index_s in APLoss refers to the indices for positive samples. However, we don't conduct positive indexing inside the training loop but integrate it with the loss function. You can find this operation in the code of APLoss_SH class.

posNum refers to the number of positive samples in each min-batch (for training data), which is a parameter to control the sampler function. For testing set, we don't resampling data.

GloryyrolG commented 11 months ago

Hi @yzhuoning et al.,

Thx for ur great work. I noticed in the tutorial notebook u provided for SOAP and APLoss, it seems the method suffers from severe overfitting, i.e., test AP changes little starting from the beginning and even decreases. May I know if it is a bug? Or I may miss sth.

Looking forward to ur response.

Thx & regards,

image

optmai commented 10 months ago

This is possible especially when you have a small-scale but highly skewed dataset. To alleviate this issue, model selection (e.g., early stopping) could be used. Thanks!

optmai commented 10 months ago

If you have any other questions, please let us know.