LapisRaider / Object-Human-Interaction

From a monocular video, generate 3D animation and models of humans and objects interactions
1 stars 0 forks source link

Data research #12

Open LapisRaider opened 8 months ago

LapisRaider commented 8 months ago

Weak perspective camera inputs

Weak perspective makes it so the depth in a normal perspective camera will linearly scale the objects at a further distance instead of relying on the projection matrix.

It is a custom class created by VIBE and inherits from pyrender.Camera. It takes in 2 main parameter, a scale and translation

New projection matrix created image

Only affects x and y axis.

The x coordinates seems to be inversed if you move the camera's coordinates image

LapisRaider commented 7 months ago

Deepsort:

Deepsort uses Kalman Filter, which gives the best compromise between a prediction and a measurement. Prediction is made from the past state to the present state

estimating the state of the system using past observations/data and current measurements. Mean and variance of our tracking estimation (previous data to estimate) and the mean and variance of the equations on motions to give us an estimate then combine to give us the optimal estimation. Cause speed may change, its not always constant.

image

How the deepsort works:

        features = self.encoder(_vidFrameData, bboxes_xywh) # get appearance features of obj
        detections = [Detection(bbox, score, feature, label) for bbox, score, feature, label in zip(bboxes_xywh, scores, features, labels)]
        self.deepsortTracker.predict()
        self.deepsortTracker.update(detections)
        self.objsInFrames.append([])
  1. First we create Detection objects for each object detected in the current frame
  2. Deepsort stores a list of track and will try to match the current's frame detections with any of its existing track that has been confirmed as valid using the appearance features
  3. For tracks that has yet to be confirmed, they try to match the remaining detections with it based on the IOU
  4. For detections that cannot be matched to any track, it most likely means it is a new track, does we create a new track for it
  5. For tracks that does not have detections this frame, it will mark as missed where it will determine whether it should be deleted and no longer track (based on when it was last updated or whether it was never confirmed for tracking)
  6. For tracks that had a match in the current frame, they will update itself
  7. Before doing all this, the track will predict through the kalman filter and that's where it will update its age and time.