exitudio / GaitMixer

Official repository for "GaitMixer: Skeleton-based Gait Representation Learning via Wide-spectrum Multi-axial Mixer"
23 stars 1 forks source link

About Mparams and GFLOPs of GaitMixer #2

Open puyiwen opened 1 year ago

puyiwen commented 1 year ago

Hi, it is a great work! I want to know the whole Mparams and GFLOPs of GaitMixer, including the pose estimator. Can you tell me? Thank you very much!!

exitudio commented 1 year ago

I assume that you are trying to compare with the appearance base methods, right? The model-based (skeleton-based) methods are a lot smaller models because they use skeleton data instead of raw images. However, the heavy part is pose estimation.

puyiwen commented 1 year ago

我假设您正在尝试与外观基本方法进行比较,对吧?基于模型(基于骨架)的方法是小得多的模型,因为它们使用骨架数据而不是原始图像。但是,沉重的部分是姿势估计。

  • HRNet w32 是 41.230M 参数(从这里开始)
  • GaitMixer 有 166,304 个参数。我们已经在这里打印了参数。对于 FLOP,我们没有对其进行测量。

Thank you for your answer, I'm comparing appearance-based model or skeleton-based model which smaller. And I find a smaller HRNet for pose estimation, called Lite-HRNet. Why are you use HRNet w32 instead of Lite-HRNet, because Lite-HRNet pose estimation is not as good as HRNet w32?

exitudio commented 1 year ago

We didn't explore much about the pose estimator. We experimented with SimCC. But it's not a clear improvement so we just follow GaitGraph which uses HRNet.

puyiwen commented 1 year ago

We didn't explore much about the pose estimator. We experimented with SimCC. But it's not a clear improvement so we just follow GaitGraph which uses HRNet.

Thank you for your quick reply! I'm so sorry to bother you again, I want to calculate the GFLOPs of the model first, so I want to know what is the input dimension of the model?

exitudio commented 1 year ago

the input has 4 dimensions [batch size, # of frames, # of joints, dim]

batch size = 64
 # of frames = 60
 # of joints = 17
dim = 2 (x,y)

In GaitGraph, dim=3, they add confidence from the pose estimator. But we experiment with that, there is no improvement so we ignore it.

puyiwen commented 1 year ago

the input has 4 dimensions [batch size, # of frames, # of joints, dim]

batch size = 64
 # of frames = 60
 # of joints = 17
dim = 2 (x,y)

In GaitGraph, dim=3, they add confidence from the pose estimator. But we experiment with that, there is no improvement so we ignore it.

Thank you very much!! I calculate the GFLOPs and Params of the model and I find GaitMixer is very small. Are these skeleton-based methods small as GaitMixer? And I've always been confused as to why skeleton-based methods perform worse than appearance-base methods. It stands to reason that skeleton-based methods noise is less noisy than appearance-base methods. Can you tell me? Thank you very much again.

exitudio commented 1 year ago

Yeah, the other skeleton-based methods should be similar. GaitGraph has even less computation because it's using graph convolution in the spatial dimension while GaitMixer is using self-attention. But to be fair, we should include pose estimation to calculate end-to-end detection.

appearance-based vs skeleton-based Theoretically, appearance-based method is not pure gait recognition because it sees the other visual cues such as hair, face, and clothes. As you can see the accuracies for appearance-based method drop a lot in different cloth condition (CL). I think skeleton-based methods are more practical and robust.