IcarusWizard / MAE

PyTorch implementation of Masked Autoencoder
MIT License
234 stars 46 forks source link

Expected dtype int64 for index #3

Closed meokey closed 2 years ago

meokey commented 2 years ago

sorry maybe it's a stupid question - I'm new to torch .... I experience the below issue when trying first step, please help. Thanks.

C:\Python\MAE>python mae_pretrain.py Files already downloaded and verified Files already downloaded and verified Adjusting learning rate of group 0 to 1.2000e-05. 0%| | 0/98 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Python\MAE\mae_pretrain.py", line 54, in predicted_img, mask = model(img) File "C:\Users\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "C:\Python\MAE\model.py", line 141, in forward features, backward_indexes = self.encoder(img) File "C:\Users\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "C:\Python\MAE\model.py", line 70, in forward patches, forward_indexes, backward_indexes = self.shuffle(patches) File "C:\Users\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "C:\Python\MAE\model.py", line 33, in forward patches = take_indexes(patches, forward_indexes) File "C:\Python\MAE\model.py", line 18, in take_indexes return torch.gather(sequences, 0, repeat(indexes, 't b -> t b c', c=sequences.shape[-1])) RuntimeError: gather(): Expected dtype int64 for index

Environtment: Windows 11 C:\Python\MAE>python --version Python 3.9.9

IcarusWizard commented 2 years ago

Hi @meokey. Sorry for your problem. I have only tested the code on Linux. Your problem seems to be caused by the different default data type for np.arange (on Linux is int64 but on Windows is int32).

I have updated the code to convert the data type explicitly. It should be able to resolve your issue.

meokey commented 2 years ago

Hi @meokey. Sorry for your problem. I have only tested the code on Linux. Your problem seems to be caused by the different default data type for np.arange (on Linux is int64 but on Windows is int32).

I have updated the code to convert the data type explicitly. It should be able to resolve your issue.

it works perfectly! thanks a lot.

However, when I run the 2nd step, it throws another memory issue. It seems that the default numbers of workers and batch_size is too large to my available memory size. Is it possible to determine the number of worker and batch_size dynamically based on available memory size? Thanks.

C:\Python\MAE>python train_classifier.py Files already downloaded and verified Files already downloaded and verified C:\Users\AppData\Roaming\Python\Python39\site-packages\torch\utils\data\dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked)) Adjusting learning rate of group 0 to 1.0000e-04. 0%| | 0/391 [00:00<?, ?it/s] Traceback (most recent call last): File "train_classifier.py", line 67, in logits = model(img) File "C:\Users\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/home/bill/python/MAE/model.py", line 161, in forward features = self.layer_norm(self.transformer(patches)) File "C:\Users\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\container.py", line 141, in forward input = module(input) File "C:\Users\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "C:\Users\AppData\Roaming\Python\Python39\site-packages\timm\models\vision_transformer.py", line 213, in forward x = x + self.drop_path(self.attn(self.norm1(x))) File "C:\Users\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\AppData\Roaming\Python\Python39\site-packages\timm/models/vision_transformer.py", line 189, in forward attn = (q @ k.transpose(-2, -1)) self.scale RuntimeError: [enforce fail at CPUAllocator.cpp:68] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 101451264 bytes. Error code 12 (Cannot allocate memory)

IcarusWizard commented 2 years ago

You can just decrease the value of max_device_batch_size utils it fits your device.