Open saurabheights opened 2 years ago
@saurabheights , thanks for submitting the issue.
CVAT doesn't have batch processing for now. I remember that users complained about that.
Also, we have the OpenCV tracker and https://github.com/opencv/cvat/pull/4886 from @dschoerk. We will merge it soon.
The number of steps is big, but it will give you the best quality.
A related article: https://arxiv.org/pdf/2003.07618.pdf
@saurabheights i noticed you are using v2.1.0, when using any single object tracker you should experiment with the develop branch. until recently there was an off-by-one error in the frame number of the prediction.
https://github.com/opencv/cvat/issues/4870 (the relevant commit is linked in this issue, use any newer version than this)
Doesnt generate any results. When object moves, the tracking bounding box stays at the same place. This might be due to pressing - "Go next with a step [V]" button, which is needed, because pressing next frame by frame, will take 5 minutes per 10 frames and objects can be stationary for 100 of frames ... What I would have preferred is for CVAT to process next N frames with SIAM tracker. If a new object enters the vision in those N frames, I would update trackers and submit a reprocessing request.
i would like to see this as a new feature! currently the single object tracking is stateless on the side of nuclio, which means that the tracker state is sent to nuclio for each frame. without testing i think this is a significant computation overhead. at some point i had a tracker state of ~3mb for the TransT implementation, but haven't investigated it further. for siamese trackers like SiamMask and also TransT this at least includes the cropped search region and template image in some shape or form. just an fyi: TransT is slightly slower for me than SiamMask (using an RTX 3090), but is far more accurate in my use case of pedestrian tracking.
a neat benefit of siamese trackers over object detection is that they are typically class agnostic.
My actions before raising this issue
Expected Behaviour
Current Behaviour
I need to track each object visible on 4 cameras. AFAIK, CVAT doesnt support multiple cameras. To workaround this problem, I have created a single video by tiling video from 4 cameras.
However, to annotate faster, I would prefer to have some form of automatic annotations, or atleast semi-automatic with minimal supervision. I have tested object detection model as well as SIAMMASK, but both comes with their own problems.
FasterRCNN doesnt generate tracks. Also, generating automatic annotations for 30 minutes video took a few hours[need to profile, but it is quite slow]. Q. Is there a way to speed this up? For example, increasing inference batch size.
SIAMMASK -
Q. Please can you provide any ideas to improve this process in general or correct me where you think I might be doing it wrong?
Another idea I have is to go for adding an object detection and tracking model that doesnt require seeds. Use it instead of SIAMMASK to generate automatic annotations before manual annotation process. However, I am not sure if tracking annotations generated by a model can be directly ingested into CVAT.
Your Environment
git log -1
):commit 3bd7c7e422d57986bd629da07214a3a3e666c68c (HEAD -> master, tag: v2.1.0, origin/master)
docker version
(e.g. Docker 17.0.05):20.10.9