andrewRowlinson / data-science

Notes from my data science escapades in Python.
11 stars 2 forks source link

Hard Mish? #1

Open glenn-jocher opened 4 years ago

glenn-jocher commented 4 years ago

@andrewRowlinson hey there! I saw your work on hard mish: https://github.com/andrewRowlinson/data-science/blob/master/neural_networks/hard-mish.ipynb

Now that yolov4 with mish is out, there's been a renewal of the memory consumption issues that comes with replacing relu with mish. I was wondering if you came to any conclusions in your earlier work on the subject?

andrewRowlinson commented 4 years ago

I am afraid not. I haven't had time to test it and it was just theoretical 😁. I had thought you could try this:

def hard_mish(x): hard7 = x*np.clip(x+3,a_min=0,a_max=7)/7 return np.where(x<0, hard7, x)

Which gets pretty close, but I don't have the time! It's also not smooth around 0, so I don't know how effective it would be.

glenn-jocher commented 4 years ago

@andrewRowlinson got it, thanks. I did a study that included hard swish, and discovered little to no benefit of using hard swish vs regular swish in terms of training time and gpu consumption in pytorch, so I've shelved interest in hard mish. Mish in general seems to work well mathematically but appears to be a nightmare in practice.

glenn-jocher commented 4 years ago

ah, study link: https://github.com/ultralytics/yolov3/issues/1098#issuecomment-620194657