Closed tybxiaobao closed 8 years ago
please make sure the line https://github.com/tornadomeet/mxnet/blob/seg/example/fcn-xs/image_segmentaion.py#L31 is your correct gpu id; you can print information before and end of that line , to see whether it is broken there.
@tornadomeet It broken at the line https://github.com/tornadomeet/mxnet/blob/seg/example/fcn-xs/image_segmentaion.py#L46. And the L31 gets gpu(0), since I have only one GPU, gpu(0) is the correct id.
i cannot reproduce the error here, from what your description, we can get that it broken when loadding the model, so which model do u use, your own trained model or the model i provided?
@tornadomeet I use the model you provided.
@tybxiaobao the error message is :/mshadow/mshadow/./tensor_gpu-inl.h:35 and the code in tensor_gpu-inl.h:35 is
template<>
inline void SetDevice<gpu>(int devid) {
MSHADOW_CUDA_CALL(cudaSetDevice(devid));
}
it says your devide number is not exist, that is invalid device ordinal
so please check your code or somelse setting carefully.
one way to test is to use cpu, but it will take about one minite for segmentation.
@tornadomeet Thanks. It's OK when I use my own trained model. So may be the pre-trained model is distroyed when I download it from the url link you provided.
Hi, by debugging, I found that the problem comes with this line:
... = mx.model.load_checkpoint(args.prefix, args.epoch)
The reason must be that the pretrained model FCN8s_VGG16-0019.params or VGG_FC_ILSVRC_16_layers-0074.params contains information of GPU device number which does not match the client side. Please correct if necessary.
@zmonoid thanks for points this. i have update the FCN8s_VGG16-0019.params
with cpu device, so you can test it.
Thanks very much for updating. May I ask is there anyway to manually change the parameter in .params file?
@zmonoid to my know, there is no directly way for changging .param. i just load .params and save it with ctx=cpu using python script. and i think this is a bug of saving the model with GPU device information.
@tornadomeet Thank you, could you share the script with me? I need to change the GPU device info to gpu(0).
Also, I think it is more reasonable to delete the checking of GPU device information of load_checkpoint function, which causes this error directly.
just a simply code like this:
import argparse
import mxnet as mx
import numpy as np
import logging
import symbol_fcnxs
workspace = 1536
ctx = mx.cpu()
def load_checkpoint(prefix, epoch):
save_dict = mx.nd.load('%s-%04d.params' % (prefix, epoch))
arg_params = {}
aux_params = {}
for k, v in save_dict.items():
tp, name = k.split(':', 1)
if tp == 'arg':
arg_params[name] = mx.nd.array(v.asnumpy(), ctx)
if tp == 'aux':
aux_params[name] = mx.nd.array(v.asnumpy(), ctx)
return (arg_params, aux_params)
def main():
fcn8s = symbol_fcnxs.get_fcn8s_symbol(21, workspace)
fcn8s_args, fcn8s_auxs = load_checkpoint(args.prefix, args.epoch)
save_callback = mx.callback.do_checkpoint("FCN8s_VGG16-new")
save_callback(args.epoch-1, fcn8s, fcn8s_args, fcn8s_auxs)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='training pascal voc segmentation using fcn-16s.')
parser.add_argument('prefix', default='FCN8s_VGG16',
help='The prefix(include path) of vgg16 model with mxnet format.')
parser.add_argument('epoch', type=int, default=19,
help='The epoch number of fcn16s model.')
args = parser.parse_args()
main()
After following the installation guide of fcn-xs (https://github.com/tornadomeet/mxnet/tree/seg/example/fcn-xs), I have successfully used the code for training. But when I use the pre-trained model for image segmentation test, an error is reported with saying “./dmlc-core/include/dmlc/logging.h:208: [12:04:59] ./mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: invalid device ordinal”. The mshadow has been updated to the lastest one. The test is performed on a PC having one GPU card with 12G memory. Does anyone kown what's going on?