Open PaulScotti opened 1 year ago
What is chunked sigmoid loss? See this twitter thread (it even contains code for implementation! (albeit in JAX)): https://twitter.com/borisdayma/status/1663269506289135616
Changing cross-entropy loss to sigmoid would allow us to calculate contrastive loss independently on each device (assuming we are using multi-gpu) without all-gather operations (which cost mem + compute).
What is chunked sigmoid loss? See this twitter thread (it even contains code for implementation! (albeit in JAX)): https://twitter.com/borisdayma/status/1663269506289135616
Changing cross-entropy loss to sigmoid would allow us to calculate contrastive loss independently on each device (assuming we are using multi-gpu) without all-gather operations (which cost mem + compute).