Open InstantWindy opened 5 years ago
Speaking as a third-party researcher: Hi, InstantWindy. If you wish to achieve the model parameters, memory usage reduction, network acceleration, you may need to rewrite some of the DL frameworks, or even the CUDA lib (maybe). And there is a lot of work here. So basically, researchers use fp32/fp16 to implement the idea and just use it to verify their idea. Hope it may help, thanks!
I found that the model parameters and memory usage did not decrease, and the network did not run faster. I'm confused!