The original model used balanced-cross entropy loss. However, your model didn't use it and performs well. I guess it's because your code does some kind of hard negative mining in the preprocessing stage and already solves class imbalance problem before network stage. Would it be correct guess?
The original model used balanced-cross entropy loss. However, your model didn't use it and performs well. I guess it's because your code does some kind of hard negative mining in the preprocessing stage and already solves class imbalance problem before network stage. Would it be correct guess?