deep-spin / entmax

The entmax mapping and its loss, a family of sparse softmax alternatives.
MIT License
407 stars 43 forks source link

A bug when alpha = 1 for entmax_bisect? #23

Closed mysteriouslfz closed 3 years ago

mysteriouslfz commented 3 years ago

For function "entmax_bisect", when given parameter alpha, it can give out results like: softmax (alpha = 1), entmax15 (alpha = 1.5), sparsemax (alpha = 2). But when I try alpha = 1, it gives out wrong results that all number is the same. But when I set alpha = 0.99999 or 1.00001, it works well. And other alpha, like 2 and 1,5, this function also works well. So is this a bug or I just use it wrongly? Thank you a lot!

vene commented 3 years ago

Hm, good point -- this is expected behaviour, but not clearly documented right now.

Numerically, the alpha=1 case cannot be solved by this algorithm. For alpha=1 you should simply use torch.nn.functional.softmax which is faster (linear-time).

We will update the documentation to clarify this.

mysteriouslfz commented 3 years ago

Hm, good point -- this is expected behaviour, but not clearly documented right now.

Numerically, the alpha=1 case cannot be solved by this algorithm. For alpha=1 you should simply use torch.nn.functional.softmax which is faster (linear-time).

We will update the documentation to clarify this.

Thank you a lot for your replication! This solves my question well.

vene commented 3 years ago

docstring updated, thanks for reporting