666DZY666 / micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
MIT License
2.2k stars 478 forks source link

使用多卡训练时的bug #64

Open coderhss opened 3 years ago

coderhss commented 3 years ago
self.scale = torch.max(self.scale, self.eps)                                                    # processing for very small scale

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! 在训练IOA时,使用多卡训练会报错。请问有人有遇到吗?

666DZY666 commented 3 years ago

IAO请先用单卡

coderhss commented 3 years ago

难受了,单卡存不下

666DZY666 commented 3 years ago

把 --eval_batch_size 调小

xiaoguoer commented 3 years ago

如果是iao,我在output = (torch.clamp(self.round(input / self.scale - self.zero_point), self.quant_min_val, self.quant_max_val) + self.zero_point) * self.scale的前面添加了 if self.scale.device != input.device: self.scale = self.scale.to(input.device) if self.zero_point.device != input.device: self.zero_point = self.zero_point.to(input.device) if self.quant_min_val.device != input.device: self.quant_min_val = self.quant_min_val.to(input.device) if self.quant_max_val.device != input.device: self.quant_max_val = self.quant_max_val.to(input.device) 可以用多卡了,反正就是直接用input.device,不要用observer.device