Closed sivaji123256 closed 1 year ago
Could you please elaborate what the use case? Do you want to detect only certain classes using pretrained model or something different?
hi @BloodAxe ,yes , i would like to filter a specific class from pretrained model.Also , I have the following issue have tested on a video using ultralityics yolov8 and yolo nas as well using Tesla T4.I am not sure why but Yolo NAS large running at 20 msec .i.e calculated using 17000 msec /876 frame ( 17 sec as it was printed during inference) i.e as below in the figure whereas yolov8 large sized model is running at 11.5 msec per frame. I am not sure why its Yolo NAS is taking more time.Any idea on this?
Regarding filtering of classes - no, currently this is not supported.
Regarding inference time - Numbers that we shared on the frontier plot are obtained using TensorRT inference engine (batch size 1). This is in consistency with other models on that plot (YOLO-V8/7/6/5,PPYOLO). So we compare apples to apples.
Native pytorch inference is expected to be much slower. There are a number of reasons:
1) Eager execution has it's price
2) PyTorch model by default uses non-fused RepVGG blocks which is suitable for training, but for inference you may want to fuse them (You can achieve this by calling model.prep_model_for_conversion((640, 640))
but bear in mind that you cannot train this model after you've called prep_model_for_conversion)
3) Video decoding / Moving images to GPU / Visualization of results takes additional time.
I'm not ready to comment regarding yolov8 as I'm not familiar with implementation of their inference.
@BloodAxe ,Thanks .But first of all , can you confirm me whether I was calculating the FPS value in a correct way as per the previous image attached?
The processing FPS can actually be seen directly next to the loading bar. In the following example, the FPS is 39.49it/s (i.e. fps)
Predicting Video: 100%|███████████████████████| 306/306 [00:07<00:00, 39.49it/s]
If you calculate it manually, you might get a different (wrong) value because the time in sec (7 here) is rounded. In my example, this would give: 306/7=43.71fps
.
Note that there are 2 fps values: the one of the original video and the one at which we process the video. To avoid confusion, we chose not to write the processing fps on the video because it could be mistaken for the video fps. Both type of fps are different because the video is not streamed (we first process it, and then visualize/save in the original fps)
Hi @sivaji123256, regarding your original question of filtering classes.
This is not available out-of-the-box, but you can achieve this by modifying the model a bit. As you can see in the implementation of the head, it has out_channels=num_classes
, which basically mean you have num_classes
filters, each corresponds to a class. If you then take, in the forward
method, the relevant filters (e.g., classes (=indices) [1], [15], [78]
, you are actually doing the filter you desire. Let me know if that helps
Closing as answered
@NatanBagrov. Thank you for providing the necessary references. The method you have mentioned to filter classes seems to be too complex for me to comprehend and implement as a beginner. I'd be grateful if you could elaborate more on this topic. Thanks again
@NatanBagrov @BloodAxe . I modified a reference script provided by @Louis-Dupont. It is only missing the feature to to look for humans/persons in the frame.
import cv2
import time
import asyncio
import logging
from super_gradients.training import models
from super_gradients.common.object_names import Models
from telegram import Bot
from telegram.error import TelegramError
# Set up logging
logging.basicConfig(filename='app.log',
filemode='w',
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
level=logging.INFO)
logger = logging.getLogger(__name__)
TOKEN = 'YOUR_BOT_TOKEN'
CHAT_ID = 'YOUR_CHAT_ID'
bot = Bot(token=TOKEN)
async def send_message(text):
try:
await bot.send_message(chat_id=CHAT_ID, text=text)
except TelegramError as e:
logger.error("Failed to send message through Telegram with error: %s", e)
# Create a loop to run the async function in
loop = asyncio.get_event_loop()
loop.run_until_complete(send_message("Detection Program Started"))
def release_resources():
logger.info("Releasing video capture objects and closing windows")
cap1.release()
cap2.release()
cap3.release()
cap4.release()
cap5.release()
cv2.destroyAllWindows()
start_time = time.time()
logger.info("Script start time: %s", start_time)
try:
logger.info("Starting to load model")
model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco").cuda()
logger.info("Detection Model loaded successfully")
loop.run_until_complete(send_message("Detection Model loaded successfully"))
except Exception as e:
logger.error("Detection Model loading failed with error: %s", e)
loop.run_until_complete(send_message("Model loading failed with error: " + str(e)))
try:
cap1 = cv2.VideoCapture('camstream1')
cap2 = cv2.VideoCapture('camstream2')
cap3 = cv2.VideoCapture('camstream3')
cap4 = cv2.VideoCapture('camstream4')
cap5 = cv2.VideoCapture('camstream5')
except Exception as e:
logger.error("VideoCapture initialization failed with error: %s", e)
loop.run_until_complete(send_message("VideoCapture initialization failed with error: " + str(e)))
while True:
try:
captures = [(cap1, 'cam_name1'), (cap2, 'cam_name2'), (cap3, 'cam_name3'), (cap4, 'cam_name4'), (cap5, 'cam_name5')]
frames = []
logger.info("Reading frames from cameras")
for cap, camera_name in captures:
ret, frame = cap.read()
if not ret:
loop.run_until_complete(send_message(f"Failed to read frames from camera {camera_name}"))
else:
frames.append(frame)
logger.info("Predicting frames")
predictions = model.predict(frames)
for i, (predicted_frame, (_, camera_name)) in enumerate(zip(predictions, captures)):
cv2.imshow(camera_name, predicted_frame.draw())
if cv2.waitKey(1) & 0xFF == ord("q"):
break
except Exception as e:
logger.error("An error occurred: %s", e)
loop.run_until_complete(send_message("An error occurred: " + str(e)))
time.sleep(5) # optional: wait before trying again
continue
release_resources()
end_time = time.time()
logger.info("Script end time: %s", end_time)
execution_time = end_time - start_time
logger.info("Script executed in: %s seconds", execution_time)
print(f"Script executed in: {execution_time} seconds")
loop.run_until_complete(send_message(f"Script executed in: {execution_time} seconds"))
@NatanBagrov. Thank you for providing the necessary references. The method you have mentioned to filter classes seems to be too complex for me to comprehend and implement as a beginner. I'd be grateful if you could elaborate more on this topic. Thanks again
Do you intend to do the filtering as a post-process (after you get the prediction from the network), or you want the subclasses output directly from the model itself?
With the first option, you don't need to modify any yolo class, but rather just filter out irrelevant classes.
Thank you @NatanBagrov for quick response. Can you please explain what it means in both scenarios? I'm my project I want to save the frame when an human/person is detected in the frame with the person inside the bounding boxes. Nothing other than human in the bounding box. For this scenario which will be the best option?
Regarding filtering of classes - no, currently this is not supported.
Regarding inference time - Numbers that we shared on the frontier plot are obtained using TensorRT inference engine (batch size 1). This is in consistency with other models on that plot (YOLO-V8/7/6/5,PPYOLO). So we compare apples to apples.
Native pytorch inference is expected to be much slower. There are a number of reasons:
1. Eager execution has it's price 2. PyTorch model by default uses non-fused RepVGG blocks which is suitable for training, but for inference you may want to fuse them (You can achieve this by calling `model.prep_model_for_conversion((640, 640))` but bear in mind that you cannot train this model after you've called prep_model_for_conversion) 3. Video decoding / Moving images to GPU / Visualization of results takes additional time.
I'm not ready to comment regarding yolov8 as I'm not familiar with implementation of their inference.
@BloodAxe What about doing inference with another img-size? Such as in other YOLOs
I think this has been already asked a couple of times here and there: https://github.com/Deci-AI/super-gradients/issues/1078#issuecomment-1563346416
Can we filter the classes ?Is there any argument that we can use ?