Open LifeIsStrange opened 1 year ago
They are much slower than 5 FPS on GPU Tesla V100, and they are not Real-time.
1.5 FPS
V100 - isn't real-time - is 2000%
FSP slower than YOLOv7-e6e0.3 FPS
V100 - isn't real-time - is 12000%
FPS slower than YOLOv7-e6e1500%
FPS slower than YOLOv7 (161 FPS, 51.2% AP)~10000%
slower than YOLOv7-e6eThere are Dual-Swin-L (HTC)
and DINO-5scale (R50)
in the Table 9: https://arxiv.org/abs/2207.02696
@AlexeyAB Great answer! I can see the significant value proposition of this implementation now :) So how about you update the abstract from
YOLOv7 surpasses all known object detectors
to
YOLOv7 surpasses all known real-time object detectors
bonus question: how does it compare to the recently anounced YOLOv6? https://github.com/meituan/YOLOv6
YOLOv7 surpasses all known object detectors
to
YOLOv7 surpasses all known real-time object detectors
Real-time is 30 FPS or higher.
YOLOv7 surpasses not only real-time detectors from 30 to 160 FPS, but also non-real-time detectors in the range from 4 to 30 FPS.
how does it compare to the recently anounced YOLOv6? https://github.com/meituan/YOLOv6
Page 11: https://arxiv.org/pdf/2207.02696.pdf
@AlexeyAB Fair enough, I wish every paper would defend their value as well as you did, in an evidence based way :). However, it seems to me that YOLOR-D6 beat (in some FPS range at least) YOLOv7. YOLOR-D6 is not YOLOv6, it achieve 57.3% AP which is 0.5% more than YOLOv7, and has 34fps while YOLOv7 has 36fps if I understand correctly. Still YOLOR-D6 is using extra training data indeed. But at the end of the day, end users want a fast model with the best accuracy and will generally accept extra training data for pragmatism sake. Hence the following questions: Do you plan on making a YOLOv7 version with improved accuracy via leveraging extra training data? Secondly, I believe you can improve the state of the art while not significantly altering performance, by being the first to use the following very simple to use innovations, for object detection. https://github.com/lessw2020/Ranger21 or https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer https://arxiv.org/abs/2106.13731
it includes generally applicable innovations that improve accuracy, such as: https://github.com/digantamisra98/Mish The mish activation function is in most cases the best activation function, often yielding 0.5-1% accuracy increase for free. Ranger can in addition use gradient centralization, https://github.com/Yonghongwei/Gradient-Centralization which generally also give free gains. then it can use a synergetic combination of optimizers, such as RAdam in place of Adam https://github.com/LiyuanLucasLiu/RAdam + the complementary LookAhead https://github.com/michaelrzhang/lookahead and others
his library makes the integration and selection of optimizations passes easy. It is a tragedy that those innovations are generally ignored by all despite their huge potential in increasing SOTA for free, in key tasks.
Still YOLOR-D6 is using extra training data indeed. But at the end of the day, end users want a fast model with the best accuracy and will generally accept extra training data for pragmatism sake.
If you will train your own model on your custom dataset, you will get higher accuracy for YOLOv7 than for YOLOR. And YOLOv7 is faster.
What is the definition that you use to define a detector as real-time or not? I saw a lot of authors mentioning it on their works, but no definition at all...
What is the definition that you use to define a detector as real-time or not?
AlexeyAB commented on Jul 10, 2022:
Real-time is 30 FPS or higher.
So, real-time is 30FPS or higher. It commonly refers to the fact that if you have your input coming from a 30FPS camera, or you are processing a video captured by a 30FPS camera (which usually is the most common video frame rate used), you have no delay between one frame and the next one. Of course this also means that if the input rate of your system is e.g. 10FPS, a model that performs at 10FPS can be considered "real-time" for your application.
@WongKinYiu @AlexeyAB Hi friendly pings
Weird claim when you actually rank #20 on COCO If we exclude all models with extra training data you still rank #11. the #1 without extra data is Dual-Swin-L(HTC, multi-scale), with 60.1 box AP with extra data it is DINO(Swim-L,multi-scale) with 63.3 box AP