Open shuozhou opened 2 years ago
HI @shuozhou, I'm just a user of this repo, but I guess my changes would help you.
I have extended this repo so that it accepts free width, height, and channel size, for example: https://github.com/nttcslab/msm-mae/blob/main/msm_mae/patch_msm_mae.diff#L196
You can find all the changes here: https://github.com/nttcslab/msm-mae/blob/main/msm_mae/patch_msm_mae.diff
Here is my repository that uses files from this MAE repo for our Masked Spectrogram Modeling. https://github.com/nttcslab/msm-mae
@daisukelab You mean after training a fixed image size, the model can accept a free input size, right? In addition, can your model accept a free mask ratio? I mean that if the model is trained with a 75% mask ratio, can the model accept other than 75% mask ratio? Thanks.
@UdonDa Why don't you visit https://github.com/nttcslab/msm-mae and check what is done there by yourself? ;) Our problem handles non-squared input and free mask ratio.
@daisukelab Sorry. I wanted to ask if my understanding is correct. Thanks.
Thanks for the excellent work.
I tried to use a non-squared input image size since my data contains people only. While from
patchify()
it seems like the input is limited to squared ones?