The computation amount of all Deep Learning algorithms

Hello apollo team,

Thanks for your great works.

As far as I know, the Deep Learning algorithms are used in 7 places in Apollo. They are LiDAR based Point Cloud Segmentation, Traffic Light Detection, Traffic Light Recognition, Camera based Obstacle Detection, Camera based Lane Line Detection, Camera based KCL(some of the features is computed by Deep Learning), and Obstacle Trajectory Estimation. I don't know if we have missed anything else in apollo that is using Deep Learning too.

place	net type	input	freq
Traffic Light Detection	U-Net	?	10Hz
Lidar based oint Cloud Segmentation	ResNet50	?	?
Traffic Light Recognition	LeNet	?	?
Camera based Obstacle Detectiion + Camera based Lane Line Detection	YOLO	?	?
Obstacle Trajectory Estimation	MLP	very small	N/A
Camera based KCL(some of the features if computed by Deep Learning)	？	？	30Hz

We have a task to estimate the total computation amount of all Apollo Deep Learning algorithms. Because we need to decide if a SoC chip(with Deep Learning Acceleration core on it, about INT8 24 TOPS and maby FP16 12TOPS) is able to run Apollo or not.

Another problem is that all the Deep Learning algorithm in Apollo's caffe is FP32 and we must use FP16 or INT8 type to run it on the SoC chip's Deep Learning Acceleration core.

We have read the apollo code. According to the input sensor data size and frequency, We have estimate all the 7 places and sum up a very low TFLOPS(FP32). We realized maybe we are wrong.

1, What is the computation amout of all the Deep Learning algorithms respectively? If we transfer all the Deep Learning algorithms' data type from FP32 to FP16/INT8, what's the compuation amout?

2, Is the SoC chip(about FP16 8TOPS and INT8 24 TOPS) able to run all of the Deep Learning algorithms(or only some of them) in Apollo?

3, Can all the Deep Learning algorithms in Apollo use FP16 or INT8 data type? (we only see nvidia volta GPU with tensor core can handle FP16/INT8 natively. We are affraid of the dfficulty of transform DL algorithm form FP32 to INT8/FP16)

we have stuck on this problem for weeks, any suggestions will be aprreciated.

thanks!

ApolloAuto / apollo

The computation amount of all Deep Learning algorithms #6097