Open YadiHe opened 1 week ago
Hi, yes, the whole body will be the foreground.
Hi, yes, the whole body will be the foreground.
Thanks, could you tell me which GPU's are used for training? I'm worrying about structural attention having to compute ATTENTION over such a large area, computationally heavy.
Using the setting given in the README, with batch size of 32, we use around 30GiB (For that we used a RTX 6000Ada). Using a batch size around 12, the model uses around 12GiB (I can use a RTX 3090).
Using the setting given in the README, with batch size of 32, we use around 30GiB (For that we used a RTX 6000Ada). Using a batch size around 12, the model uses around 12GiB (I can use a RTX 3090).
thaks a lot.
Hi, I'm a new bird and when I'm training, I'm finding a lot of regular grid-like textures. Is it due to Vision Transformer's patch embedding and reconstruction process? Could you give me some suggestions, thank you very much.
Hi, can you tell me about the mask generated by SAM split, is the whole body the foreground? Just remove the treatment bed?