alam-shahul / popari

https://popari.readthedocs.io/en/latest/
MIT License
4 stars 0 forks source link

Fix issue with `project2simplex` in cases when embeddings lie far from simplex #7

Open alam-shahul opened 2 years ago

alam-shahul commented 2 years ago

A recurring hindrance during SpiceMixPlus optimization is that nan values are produced by the project2simplex function when the inputs to the function are far from the simplex. This is a recurring problem with various causes; however, it can manifest for many real datasets during the projection step for embeddings.

One solution is to prevent the embeddings from reaching such extreme values in the first place; this can probably be accomplished by some form of gradient clipping. In particular, we can clip the computed updates to the embeddings (gradient x step_size) to 1, which should solve the problem. However, this may also lead to slower convergence.