Closed HuichuanLiu closed 6 years ago
Hi @HuichuanLiu , I think your problem is more like a question. Please submit on MXNet discussion forum (https://discuss.mxnet.io), where it will get a wider audience and allow other to learn as well. @nswamy can you add 'Question' tag to this issue?
@lanking520 Well, it is not a HOW TO question, but more like a potential Problem in the checkpoint or scoring process. There're very few extra codes and I'm like just cloned the project and run the given scripts.
@HuichuanLiu Sorry for that. I do see you are using anaconda with MXNet 1.1. Have you tried build from source? Which Ubuntu machine are you using? I will try to reproduce the problem you have using a Linux Instance.
@lanking520 Right, I've tried it with mxnet-cu80 1.2.0 from pip:
[12:41:56] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: /data3/liuhuichuan/Data/imagenet/imagenet1k-val.rec, use 4 threads for decoding..
[12:42:01] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.9.4. Attempting to upgrade...
[12:42:01] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[12:42:06] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Finished with 160.141619 images per second
INFO:root:('accuracy', 0.48345588235294118)
INFO:root:('top_k_accuracy_5', 0.6939338235294118)
Will build from source later, but it takes some time
Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64)
Thanks for your help : )
Thanks for your test. I feel something tricky in here. Do you think the model provided have some issues? Just like you said, all configuration LGTM, but just some model super well and some of them super weird. Sounds like using dog-cat classifier to classify rocket. I will reach out to somebody who is working on MXNet model sever so we can test it over there, sounds good?
Agree, and It might involve not only the symbol definition, the pretrain params and the training process, but probably mismatch between the model and the scoring procedure, like the preprocess?
FYI, I attempted to implement a gluon version resnext-101-64x4d based on the provided symbol file and restored param values from the provided checkpoint, this work ended at around acc=0.47+. with preprocess like:
def mx_preprocess(img, ctx):
# img_cvt = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_nd = mx.nd.array(img)
img_resize = mx.image.resize_short(src=img_nd, size=256)
img_crop, _ = mx.image.center_crop(src=img_resize, size=(224, 224))
img_trans = mx.nd.transpose(img_crop, [2, 0, 1])
img_reshape = img_trans.reshape((1, 3, 224, 224))
return mx.nd.array(img_reshape, ctx=ctx)
Since resNeXt is quite new in the model_zoo, a re-evaluation could be very helpful.
Hi @HuichuanLiu Can you provide me the link where you download the model. There are too many model zoos in MXNet now
_base_model_url = 'http://data.mxnet.io/models/'
_default_model_info = {
'imagenet1k-inception-bn': {'symbol':_base_model_url+'imagenet/inception-bn/Inception-BN-symbol.json',
'params':_base_model_url+'imagenet/inception-bn/Inception-BN-0126.params'},
'imagenet1k-resnet-18': {'symbol':_base_model_url+'imagenet/resnet/18-layers/resnet-18-symbol.json',
'params':_base_model_url+'imagenet/resnet/18-layers/resnet-18-0000.params'},
'imagenet1k-resnet-34': {'symbol':_base_model_url+'imagenet/resnet/34-layers/resnet-34-symbol.json',
'params':_base_model_url+'imagenet/resnet/34-layers/resnet-34-0000.params'},
'imagenet1k-resnet-50': {'symbol':_base_model_url+'imagenet/resnet/50-layers/resnet-50-symbol.json',
'params':_base_model_url+'imagenet/resnet/50-layers/resnet-50-0000.params'},
'imagenet1k-resnet-101': {'symbol':_base_model_url+'imagenet/resnet/101-layers/resnet-101-symbol.json',
'params':_base_model_url+'imagenet/resnet/101-layers/resnet-101-0000.params'},
'imagenet1k-resnet-152': {'symbol':_base_model_url+'imagenet/resnet/152-layers/resnet-152-symbol.json',
'params':_base_model_url+'imagenet/resnet/152-layers/resnet-152-0000.params'},
'imagenet1k-resnext-50': {'symbol':_base_model_url+'imagenet/resnext/50-layers/resnext-50-symbol.json',
'params':_base_model_url+'imagenet/resnext/50-layers/resnext-50-0000.params'},
'imagenet1k-resnext-101': {'symbol':_base_model_url+'imagenet/resnext/101-layers/resnext-101-symbol.json',
'params':_base_model_url+'imagenet/resnext/101-layers/resnext-101-0000.params'},
'imagenet1k-resnext-101-64x4d': {'symbol':_base_model_url+'imagenet/resnext/101-layers/resnext-101-64x4d-symbol.json',
'params':_base_model_url+'imagenet/resnext/101-layers/resnext-101-64x4d-0000.params'},
'imagenet11k-resnet-152': {'symbol':_base_model_url+'imagenet-11k/resnet-152/resnet-152-symbol.json',
'params':_base_model_url+'imagenet-11k/resnet-152/resnet-152-0000.params'},
'imagenet11k-place365ch-resnet-152': {'symbol':_base_model_url+'imagenet-11k-place365-ch/resnet-152-symbol.json',
'params':_base_model_url+'imagenet-11k-place365-ch/resnet-152-0000.params'},
'imagenet11k-place365ch-resnet-50': {'symbol':_base_model_url+'imagenet-11k-place365-ch/resnet-50-symbol.json',
'params':_base_model_url+'imagenet-11k-place365-ch/resnet-50-0000.params'},
In short, it's http://data.mxnet.io/models/imagenet/resnext/101-layers/resnext-101-64x4d-symbol.json and http://data.mxnet.io/models/imagenet/resnext/101-layers/resnext-101-64x4d-0000.params
2. Also tried to implement gluon from params http://data.dmlc.ml/models/imagenet/resnext/101-layers/resnext-101-64x4d-0000.params and symbo.json http://data.dmlc.ml/models/imagenet/resnext/101-layers/resnext-101-64x4d-symbol.json from the homepage model_zoo, whichi also ended with approximately ~47%. Both converted gluon and the original module.
Hi @HuichuanLiu thanks for your input and testing. I am not an expert working on these models, but I will definitely find one for you: @szha can you take over from here as it lives in MXNet Model zoo?
Thanks @lanking520 And here're some updates:
My experiments shows resnet-152 restored from gluon model_zoo and from the module symbol files require different preprocess. I didn't find any clear description about this in mxnet docs and it will be nice if you can add it, it's quite confusing for the green hands like me.
I got a higher accuracy from gluon model, comparing to these statistics. Is it another inconsistence between the module and the gluon model? Or perhaps about the resnet version?
Details: I replaced resnext-101 with resnet-152 in score.py and received acc=~0.765, exactly the same as the doc shows
Then I repeated the same procedure, i.e. the same data and the same mx.io.RecordIter setting, but loaded the resnet-152 model with gluon API(), instead of the default module symbol files.
from mxnet.gluon.model_zoo.vision.resnet import get_resnet
net = get_resnet(version=2, num_layers=152, pretrained=True, root='./', ctx=ctx[1])
This leaded to broken predictions, it gives 916 after argmax for all samples, because of unnormalized input.
Next I added a standard preprocess according to the gluon model
All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (N x 3 x H x W), where N is the batch size, and H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferrably happen at preprocessing
It takes the model to acc=0.773,about 0.012 higher than the doc claims
@HuichuanLiu Many thanks for your hard work. I have checked with MXNet Gluon team. Currently we do not maintain the model zoo you are using for a long time (more than 1 year). Please try to use Gluon models that are supported. Unfortunately, I didn't see ResNext there. Will keep track on this.
@lanking520 OK, I've turned to resnets and they work perfectly. Thanks for your help. Feel free to close this issue or keep it if your team has further plans in future.
OK, my colleague solves this problem. Just apply the Inception Preprocess, i.e. img/255-rgb_mean with --rgb-mean=[123.68,116.779,103.939] Then the imagenet1k-resnext-101-64x4d model delivers acc~=0.79, as good as your docs describe.
I would strongly recommend you to modify or delete the description in the mxnet imageclassifcation example page, it says
our Resnet does not need to specify the RGB mean due the data batch normalization layer. While the inception models needs --rgb-mean 123.68,116.779,103.939
Although resnext is a hybrid variant of resnet and inception, it has a data batch normalization as first layers, so in my understanding, it does't need to reduce the rgb-mean.
However, according to the experiment, it does indeed.
Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.
For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io
Description
I used the incubator-mxnet/example/image-classification/score.py to evaluate resnext-50、resnext-101、resnext-101-64d,but none of them reached a reasonable result.
However, the resnet-101 model works perfectly well
It seems the ResNeXt models are not appropriately trained or something(preprocess?) does not fit the model in score.py?
Environment info (Required)
Package used (Python/R/Scala/Julia): (I'm using ...)
For Scala user, please provide:
java -version
)mvn -version
)scala -version
)For R user, please provide R
sessionInfo()
:Build info (Required if built from source)
Installed from pip of python3.6 Compiler (gcc/clang/mingw/visual studio):
MXNet commit hash: a48480b706763203a294cb76eb8916517ff214c1 Build config: (Paste the content of config.mk, or the build command.)
Error Message:
(Paste the complete error message, including stack trace.)
Minimum reproducible example
(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
CUR_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" MX_DIR=${CUR_DIR}/../../../
python ${CUR_DIR}/../../../tools/im2rec.py --resize 256 --quality 90 --num-thread 16 imagenet1k-val my_path_to_store_imagenet_val_data/
rm -rf val
python score.py --model imagenet1k-resnext-101-64x4d --gpus 2 --data-val /data3/liuhuichuan/Data/imagenet/imagenet1k-val.rec