Support for nVidia tensorcore

More of a feature request than a problem report and forgive my ignorance if this is irrelevant but the nvidia 20x series and the 1660ti have tensor cores which could be use when called out on the nvidia driver using the fp16 extension. tensorflow does it that way. Is there a way to implement this on dlib?

see references

https://www.pugetsystems.com/labs/hpc/TensorFlow-Performance-with-1-4-GPUs----RTX-Titan-2080Ti-2080-2070-GTX-1660Ti-1070-1080Ti-and-Titan-V-1386/ https://medium.com/@noel_kennedy/how-to-use-half-precision-float16-when-training-on-rtx-cards-with-tensorflow-keras-d4033d59f9e4 https://www.pugetsystems.com/labs/hpc/NVIDIA-Titan-V-plus-Tensor-cores-Considerations-and-Testing-of-FP16-for-Deep-Learning-1141/ https://www.servethehome.com/nvidia-geforce-rtx-2060-super-review/5/

davisking / dlib

Support for nVidia tensorcore #2104