Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.54k stars 496 forks source link

Can we have independent pytorch based script for NAS detectors (coco dataset) evaluation #1017

Closed JagdishKolhe closed 1 year ago

JagdishKolhe commented 1 year ago

Looks like, yolo-NAS architectures have different prediction format (different than standard coco) for model.predict(...)

Can we have a distinct standalone pytorch function which does not use super-gradient code base rather uses only torch and coco-eval apis, and produce standard annotation format defined by coco?

The standard coco dataset has fixed format for prediction, as follows [ {"image_id":---, "category_id":---, "bbox":[x, y, w, h], "score": ---, ........} {"image_id":---, "category_id":---, "bbox":[x, y, w, h], "score": ---, ........} ]

The indicative script can be something like follows. This might need few more changes but i hope you get an idea. ` net = models.get("yolo_nas_m", pretrained_weights="coco") net.eval() net.cuda()

outputs = [] for each in dataloader:
pred = net(each) ''' convert model prediction in standard coco format ''' outputs.append(convert_to_coco_format(pred))

''' feed outputs to standard coco eval apis ''' `

Also, this type of function might help to compare performance of predict function form super-gradients with standard coco apis. I see that many people have raised some concerns where such independent script can help them. https://github.com/Deci-AI/super-gradients/issues/958 https://github.com/Deci-AI/super-gradients/issues/1016 https://github.com/Deci-AI/super-gradients/issues/977

dagshub[bot] commented 1 year ago

Join the discussion on DagsHub!

BloodAxe commented 1 year ago

Hi. I don't see how referenced issues are related to the matter. We intentionally re-implemented mAP metric from inconvenient implementation pycocoeval to a native pytorch, faster, DDP-friendly metric implementation. The only place where pycocotools-based metric is a viable option is academia research where you want to compare many models coming from different sources and really want to compare them using same evaluation methodology. We are open for external contributions here.

JagdishKolhe commented 1 year ago

I appreciate with your/teams efforts for reimplementing mAP in more efficient manner, but I have an requirement, for research purpose which you exactly mentioned above.

So, Can you please give some documentation link or guidance to know how to convert output of model to standard coco format? Or at least tell me what is the format used by yolo-nas as output so that I can convert it to standard coco one?

BloodAxe commented 1 year ago

Let's assume you have a trained model. The easies way to get predictions from the model is to use model.predict API. We designed it to be the most convenient for the user that just want to get the predictions. Image normalization, size preprocessing and NMS will be done for you automatically.

yolo_nas = super_gradients.training.models.get("yolo_nas_l", pretrained_weights="coco").cuda()
result:ImagesDetectionPrediction = yolo_nas.predict(Image.open("lena.jpg"))

Note that return result is a ImagesDetectionPrediction container that may contain detection results for multiple frames. That is you can use predict method for images in the whole directory or a video.

Anyhow, if you're sending a single image then you can use result[0] to get bounding boxes. Our you can iterate: for image_level_predictions in result.

yolo_nas = super_gradients.training.models.get("yolo_nas_l", pretrained_weights="coco").cuda()
result:ImagesDetectionPrediction = yolo_nas.predict(Image.open("lena.jpg"))

predictions:ImagePrediction = result[0]

# And now you have it:
# Decoded bounding boxes (I think bounding boxes format is pretty clear here, right?)
predictions.prediction.bboxes_xyxy # [N,4]
predictions.prediction.confidence  # [N]
predictions.prediction.labels      # [N]
predictions.class_names

Bounding boxes are returned in the coordinate system of the original image so you can put them directly to the COCO json file according it it's format

JagdishKolhe commented 1 year ago

predict function is going to be very slow, as predicting image one by one. can we have a function which takes output of last layer and converts it in coco standard format? I need this because our underneath framework is based on standard format, it just takes last layer output.

outputs = model(inputs) # inputs is tesor of images in NHCW format, so that I can pass images in batch results_dict = convert_to_coco_format(outputs) # result dict will be in coco standerd format

PrajwalCogniac commented 1 year ago

Thanks a lot for the detailed answer @BloodAxe. I definitely see the benefits of model.predict API But if we want to use fp16 version of the pytorch , how does model.predict work in this case ? converting the pytorch yolonas is easy and can be done like this model = yolo_nas_l.half() but now how do we change the input to support fp16 ? Also does the model.predict accept torch tensors instead of numpy or images ?

BloodAxe commented 1 year ago

But if we want to use fp16 version of the pytorch , how does model.predict work in this case ? converting the pytorch yolonas is easy and can be done like this model = yolo_nas_l.half()

Converting to half seems to be working fine:

    model = models.get(Models.YOLO_NAS_S, pretrained_weights="coco").cuda().half()
    model.predict("https://deci-datasets-research.s3.amazonaws.com/image_samples/beatles-abbeyroad.jpg").show()

The predict takes numpy image, images folder or urls as input. Hope it answers your question.