Open glenn-jocher opened 4 years ago
I am afraid not. I haven't had time to test it and it was just theoretical 😁. I had thought you could try this:
def hard_mish(x): hard7 = x*np.clip(x+3,a_min=0,a_max=7)/7 return np.where(x<0, hard7, x)
Which gets pretty close, but I don't have the time! It's also not smooth around 0, so I don't know how effective it would be.
@andrewRowlinson got it, thanks. I did a study that included hard swish, and discovered little to no benefit of using hard swish vs regular swish in terms of training time and gpu consumption in pytorch, so I've shelved interest in hard mish. Mish in general seems to work well mathematically but appears to be a nightmare in practice.
@andrewRowlinson hey there! I saw your work on hard mish: https://github.com/andrewRowlinson/data-science/blob/master/neural_networks/hard-mish.ipynb
Now that yolov4 with mish is out, there's been a renewal of the memory consumption issues that comes with replacing relu with mish. I was wondering if you came to any conclusions in your earlier work on the subject?