Open WJDNJSDJ opened 5 months ago
Hi,
Thanks for your interests. We train our model with a single A100 GPU, which is mentioned in our paper. What you could consider is reducing some of the hyper-parameters, such as the number of Transformer block in each stage, in this way you might be able to train our own model. Hope this is helpful.
None
for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=VGG19_Weights.IMAGENET1K_V1
. You can also use weights=VGG19_Weights.DEFAULT
to get the most up-to-date weights.
warnings.warn(msg)
/home/liu/anaconda3/envs/HINT/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None
for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1
. You can also use weights=VGG16_Weights.DEFAULT
to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: /home/liu/anaconda3/envs/HINT/lib/python3.8/site-packages/lpips/weights/v0.1/vgg.pth
module 'numpy' has no attribute 'str'.
np.str
was a deprecated alias for the builtin str
. To avoid this error in existing code, use str
by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.str_
here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Model configurations:MODE: 1 # 1: train, 2: test MODEL: 2 # 2: inpaint model MASK: 3 # 0: no mask, 1: random block, 2: center mask, 3: external, 4: 50% external, 50% random block, 5: (50% no mask, 25% ramdom block, 25% external) 6: external non-random SEED: 10 # random seed GPU: [0] # list of gpu ids AUGMENTATION_TRAIN: 0 # 1: use augmentation to train landmark predictor 0:not use TRAIN_INPAINT_IMAGE_FLIST: /home/liu/ZZB/HINT-main/script/Train_GT/Train_GT.txt TEST_INPAINT_IMAGE_FLIST: TRAIN_MASK_FLIST: /home/liu/ZZB/HINT-main/script/Mask.txt TEST_MASK_FLIST: LR: 0.0001 # learning rate D2G_LR: 0.1 # discriminator/generator learning rate ratio BETA1: 0.9 # adam optimizer beta1 BETA2: 0.999 # adam optimizer beta2 WD: 0 LR_Decay: 1 BATCH_SIZE: 4 # input batch size for training INPUT_SIZE: 256 # input image size for training 0 for original size
start training...
Training epoch: 1
**** 0
Training epoch: 2
**** 0
Training epoch: 3
**** 0
Training epoch: 4
**** 0
Training epoch: 5
**** 0
/home/liu/ZZB/HINT-main/src/dataset.py:164: FutureWarning: In the future np.str
will be defined as the corresponding NumPy scalar.
return np.genfromtxt(flist, dtype=np.str, encoding='utf-8')
Training epoch: 6
**** 0
Training epoch: 7
**** 0
Training epoch: 8
**** 0
Training epoch: 9
**** 0
Training epoch: 10
**** 0
Thank you very much for your reply despite your busy schedule.
With your help, this problem can be solved, but when running, len(self.data_info)==0 in the Dataset means that there is no data in the dataset passed to the dataloader. Currently 1) Check the path in the dataset and feel there is no problem; 2) Add print("**", len(train_loader)) in the HINT.py while loop , the output is 0. The running log is as follows: Cuda is available Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] /home/liu/anaconda3/envs/HINT/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead. warnings.warn( /home/liu/anaconda3/envs/HINT/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or
None
for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passingweights=VGG19_Weights.IMAGENET1K_V1
. You can also useweights=VGG19_Weights.DEFAULT
to get the most up-to-date weights. warnings.warn(msg) /home/liu/anaconda3/envs/HINT/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum orNone
for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passingweights=VGG16_Weights.IMAGENET1K_V1
. You can also useweights=VGG16_Weights.DEFAULT
to get the most up-to-date weights. warnings.warn(msg) Loading model from: /home/liu/anaconda3/envs/HINT/lib/python3.8/site-packages/lpips/weights/v0.1/vgg.pth module 'numpy' has no attribute 'str'.np.str
was a deprecated alias for the builtinstr
. To avoid this error in existing code, usestr
by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, usenp.str_
here. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations Model configurations:MODE: 1 # 1: train, 2: test
MODEL: 2 # 2: inpaint model MASK: 3 # 0: no mask, 1: random block, 2: center mask, 3: external, 4: 50% external, 50% random block, 5: (50% no mask, 25% ramdom block, 25% external) 6: external non-random SEED: 10 # random seed GPU: [0] # list of gpu ids AUGMENTATION_TRAIN: 0 # 1: use augmentation to train landmark predictor 0:not use TRAIN_INPAINT_IMAGE_FLIST: /home/liu/ZZB/HINT-main/script/Train_GT/Train_GT.txt TEST_INPAINT_IMAGE_FLIST: TRAIN_MASK_FLIST: /home/liu/ZZB/HINT-main/script/Mask.txt TEST_MASK_FLIST: LR: 0.0001 # learning rate D2G_LR: 0.1 # discriminator/generator learning rate ratio BETA1: 0.9 # adam optimizer beta1 BETA2: 0.999 # adam optimizer beta2 WD: 0 LR_Decay: 1 BATCH_SIZE: 4 # input batch size for training INPUT_SIZE: 256 # input image size for training 0 for original size
MAX_ITERS: 300001 # maximum number of iterations to train the model
MAX_ITERS: 10000 # maximum number of iterations to train the model L1_LOSS_WEIGHT: 1 # l1 loss weight STYLE_LOSS_WEIGHT: 250 # style loss weight CONTENT_LOSS_WEIGHT: 0.1 # perceptual loss weight INPAINT_ADV_LOSS_WEIGHT: 0.01 # adversarial loss weight GAN_LOSS: lsgan # nsgan | lsgan | hinge GAN_POOL_SIZE: 0 # fake images pool size SAVE_INTERVAL: 1000 # how many iterations to wait before saving model (0: never) EVAL_INTERVAL: 0 # how many iterations to wait before model evaluation (0: never) LOG_INTERVAL: 100 # how many iterations to wait before logging training status (0: never) start training... Training epoch: 1 **** 0 Training epoch: 2 **** 0 Training epoch: 3 **** 0 Training epoch: 4 **** 0 Training epoch: 5 **** 0 /home/liu/ZZB/HINT-main/src/dataset.py:164: FutureWarning: In the future
np.str
will be defined as the corresponding NumPy scalar. return np.genfromtxt(flist, dtype=np.str, encoding='utf-8') Training epoch: 6 **** 0 Training epoch: 7 **** 0 Training epoch: 8 **** 0 Training epoch: 9 **** 0 Training epoch: 10 **** 0
Hi,
Is the address referred to the .flist file?
Hello, thank you very much for your contribution. I encountered the following problem while running this code: insufficient memory error, CUDA memory shortage. I have set the training batch to the minimum size, and the error occurred even after running out of memory. Please do not hesitate to give me your advice. Thank you very much, and I wish you a happy life and all the best!!
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.08 GiB (GPU 0; 15.70 GiB total capacity; 3.28 GiB already allocated; 6.08 GiB free; 7.45 GiB reserved in total by PyTorch) If reserved memory is >>allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF