About Mparams and GFLOPs of GaitMixer

exitudio / GaitMixer

Official repository for "GaitMixer: Skeleton-based Gait Representation Learning via Wide-spectrum Multi-axial Mixer"

23 stars 1 forks source link

About Mparams and GFLOPs of GaitMixer #2

Open puyiwen opened 1 year ago

puyiwen commented 1 year ago

Hi, it is a great work! I want to know the whole Mparams and GFLOPs of GaitMixer, including the pose estimator. Can you tell me? Thank you very much!!

exitudio commented 1 year ago

I assume that you are trying to compare with the appearance base methods, right? The model-based (skeleton-based) methods are a lot smaller models because they use skeleton data instead of raw images. However, the heavy part is pose estimation.

HRNet w32 is 41.230M parameters (from here)
GaitMixer has 166,304 parameters. We already print the parameters here. For FLOPs, we didn't measure it.

puyiwen commented 1 year ago

我假设您正在尝试与外观基本方法进行比较，对吧？基于模型（基于骨架）的方法是小得多的模型，因为它们使用骨架数据而不是原始图像。但是，沉重的部分是姿势估计。

HRNet w32 是 41.230M 参数（从这里开始)

GaitMixer 有 166，304 个参数。我们已经在这里打印了参数。对于 FLOP，我们没有对其进行测量。

Thank you for your answer, I'm comparing appearance-based model or skeleton-based model which smaller. And I find a smaller HRNet for pose estimation, called Lite-HRNet. Why are you use HRNet w32 instead of Lite-HRNet, because Lite-HRNet pose estimation is not as good as HRNet w32?

exitudio commented 1 year ago

We didn't explore much about the pose estimator. We experimented with SimCC. But it's not a clear improvement so we just follow GaitGraph which uses HRNet.

puyiwen commented 1 year ago

We didn't explore much about the pose estimator. We experimented with SimCC. But it's not a clear improvement so we just follow GaitGraph which uses HRNet.

Thank you for your quick reply! I'm so sorry to bother you again, I want to calculate the GFLOPs of the model first, so I want to know what is the input dimension of the model?

exitudio commented 1 year ago

the input has 4 dimensions [batch size, # of frames, # of joints, dim]

batch size = 64
 # of frames = 60
 # of joints = 17
dim = 2 (x,y)

In GaitGraph, dim=3, they add confidence from the pose estimator. But we experiment with that, there is no improvement so we ignore it.

puyiwen commented 1 year ago

the input has 4 dimensions [batch size, # of frames, # of joints, dim]
batch size = 64
 # of frames = 60
 # of joints = 17
dim = 2 (x,y)
In GaitGraph, dim=3, they add confidence from the pose estimator. But we experiment with that, there is no improvement so we ignore it.

Thank you very much!! I calculate the GFLOPs and Params of the model and I find GaitMixer is very small. Are these skeleton-based methods small as GaitMixer? And I've always been confused as to why skeleton-based methods perform worse than appearance-base methods. It stands to reason that skeleton-based methods noise is less noisy than appearance-base methods. Can you tell me? Thank you very much again.

exitudio commented 1 year ago

Yeah, the other skeleton-based methods should be similar. GaitGraph has even less computation because it's using graph convolution in the spatial dimension while GaitMixer is using self-attention. But to be fair, we should include pose estimation to calculate end-to-end detection.

appearance-based vs skeleton-based Theoretically, appearance-based method is not pure gait recognition because it sees the other visual cues such as hair, face, and clothes. As you can see the accuracies for appearance-based method drop a lot in different cloth condition (CL). I think skeleton-based methods are more practical and robust.