Closed zachmayer closed 8 years ago
that's absolutely awesome! we can also setup a website so others can try it easily
see also http://www.deepart.io/
exactly. i think we can speedup the prediction time a lot comparing them:)
Agreed! I also want to try it out locally =D
sure. please ping me at anytime if you have problem to get it run
For web-service, related to #629
@mli I haven't even started yet, so any suggestion you have would be appreciated. I'm try to figure out how to begin: should I start with a pre-trained network, or should I somehow train a network on a pair of images alone?
i guess we can first reuse the pretrained caffe models https://github.com/jcjohnson/neural-style/tree/master/models (mxnet can load caffe models)
then try to translate their lua codes to mxnet
we may consider train the network later. we already have vgg model training codes there https://github.com/dmlc/mxnet/tree/master/example/image-classification
Training the model needs much time and tune. I have tried the Lua version of https://github.com/jcjohnson/neural-style/tree/master/models for producing neural style pictures, and it worked nicely. The lua version loads the caffe model, so mxnet can load the same model too from model.sh and simply replicate the producing script from Lua to python with mxnet.
P.S. since mxnet's unique advantage is memory saving, if later on we can have an mxnet version with slightly simplified network for speed boost, we may be able to shorten the producing time within 1 min. Currently it needs about 2-3 min of producing a 512px image from a GTX 960 using 3 GB memory, if 1024px, my poor GTX 960 goes die. If we can beat Lua version in memory, we can at least produces large images.
The example looks cool. I think we can do it on Inception instead of VGG
is inception faster than vgg? vgg model is so slow as 2min on a titan x, which prevents it from being production
I would like to do it, but the priority is not that high so far.
@winstywang I'm recently taking a painting class, one of the project is to apply some specific artistic style to paint a photograph. If you implemented this, I should show it to the instructor! Haha! :laughing:
I am working on it... First I will repeat torch experiment, then I will replace VGG.
@antinucleon wondering why people use VGG so much despite it being much bigger and slower than google's model. The recently announced neuraltalk2 also uses VGG for example. Would be very interesting if we could show here that the huge VGG model could be replaced.
I have done most parts, but seems there are minor issues need to be fixed.
superb, what is the speed? i can start drafting a blog of advertising it
Someone made a demo for tensorflow too: https://github.com/woodrush/neural-art-tf
Saw the tensor flow demo too. They use the exact same caffe model as in the Lua version. Seems like the number of iteration is not large enough for understanding the style. It should be around 800-100 from the Lua version, which is the reason why take so long (2 min with Titan X). If mxnet can simply this iteration for searching for optimal parameters, mxnet can save much time. The lua version mentioned their -optimizer
parameter in the README https://github.com/jcjohnson/neural-style
I'm excited to see the results! Also, there's already another one: https://github.com/anishathalye/neural-style
tensor flow has some great marketing...
we can almost reproduce the results, thanks to @antinucleon
but the major problem is the speed. it needs thousands of iterations which takes minutes on a decent gpu cards..
@mli Is there code somewhere to reproduce the results I could try out myself?
@zachmayer I will commit it soon :)
@antinucleon Really looking forward to it!
so cool, how is the speed? let me write a blog for advertising it.
not well done yet. @antinucleon is still on training a smaller model to make it fast
I tried the example, and got Please compile with CUDA enabled
. Is it silly to try to run this example without cuda? I don't need a ton of speed and want to try it on a device without a gpu.
use --gpu -1 to disable gpu On Sat, Nov 28, 2015 at 10:39 AM Zach Mayer notifications@github.com wrote:
I tried the example, and got Please compile with CUDA enabled. Is it silly to try to run this example without cuda? I don't need a ton of speed and want to try it on a device without a gpu.
— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/664#issuecomment-160312191.
Just an estimation that the Lua version took about 40-60 minutes on CPU and 2 min on GPU, so we may expect 20-30 min on CPU with mxnet
just had some quick benchmark plus comparison with Lua version, as well as some tricks for better results.
remove-noise
is too large which makes the picture looks blur, a better value is about 0.15 instead of 0.2 . stop-eps
value needs to be a little bit smaller for a stronger style, e.g. 0.004 or 0.003, but it may take longer time to converge. Set content-weight
to 20 instead of 10 keeps more content shape, otherwise the style is so strong that distort the original content.Here it is the same cat pogo
with Van Gogh:
@phunterlau Guess we can make a cool blog about your finding and post in reddit or other place. For cuDNN bug, I still don't have any idea yet.
If switch to CuDNN, we can make at least 40% faster. So think speed won't be an issue.
most people don't have CuDNN and they just want to have a try without pain from installing Lua, so for a blog post (if mxnet committee agrees for a blog, I can start drafting one), CuDNN is not needed. As to the ultimate solution, CuDNN is worth doing, considering a GTX 980 can reach 50s, so a Titan X with CuDNN can make it 20 s? It is an significant improvement.
If it is OK with a blog now, I can start writing.
For people who want a CPU only benchmark, here it is: on clang-omp CPU only with 2 cores from an i7 macbook pro, for having the same result, it takes 39 mins total. Will a faster learning rate on lr
will help? current it is 0.1. I am going to experiment more with GPU anyway.
@phunterlau i found the trick is to use a figure as large as possible. for example, i updated the sample output by change to --max-long-edge 900
, it looks much better
@mli then how much memory does it take? I only have poor man 4gb
3.7gb;)
On Wed, Dec 2, 2015 at 10:27 PM, Hongliang Liu notifications@github.com wrote:
@mli https://github.com/mli then how much memory does it take? I only have poor man 4gb
— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/664#issuecomment-161505238.
sound like my poor GTX 960 problem: when I have max-long-edge greater than 700, it gives:
INFO:root:load the content image, size = (1280, 960)
INFO:root:resize the content image to (780, 585)
[22:42:54] ./dmlc-core/include/dmlc/logging.h:208: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param_.workspace) >= (required_size)
Minimum workspace size: 1168128000 Bytes
Given: 1073741824 Bytes
[22:42:54] ./dmlc-core/include/dmlc/logging.h:208: [22:42:54] src/engine/./threaded_engine.h:295: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param_.workspace) >= (required_size)
Minimum workspace size: 1168128000 Bytes
Given: 1073741824 Bytes
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPEto NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
terminate called after throwing an instance of 'dmlc::Error'
what(): [22:42:54] src/engine/./threaded_engine.h:295: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param_.workspace) >= (required_size)
Minimum workspace size: 1168128000 Bytes
Given: 1073741824 Bytes
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPEto NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
Command terminated by signal 6
you can replace workspace=1024
in model_vgg19.py into workspace=2048
On Thu, Dec 3, 2015 at 1:45 AM, Hongliang Liu notifications@github.com wrote:
sound like my poor GTX 960 problem: when I have max-long-edge greater than 700, it gives:
INFO:root:load the content image, size = (1280, 960) INFO:root:resize the content image to (780, 585) [22:42:54] ./dmlc-core/include/dmlc/logging.h:208: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param_.workspace) >= (required_size) Minimum workspace size: 1168128000 Bytes Given: 1073741824 Bytes [22:42:54] ./dmlc-core/include/dmlc/logging.h:208: [22:42:54] src/engine/./threadedengine.h:295: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param.workspace) >= (required_size) Minimum workspace size: 1168128000 Bytes Given: 1073741824 Bytes An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPEto NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging. terminate called after throwing an instance of 'dmlc::Error' what(): [22:42:54] src/engine/./threadedengine.h:295: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param.workspace) >= (required_size) Minimum workspace size: 1168128000 Bytes Given: 1073741824 Bytes An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPEto NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging. Command terminated by signal 6
— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/664#issuecomment-161533404.
That works. I can maximize the edge to 850 by changing workspace to 2048. Does it have special meanings?
it increases the temp buffer size, or the working space size
On Thu, Dec 3, 2015 at 11:53 PM, Hongliang Liu notifications@github.com wrote:
That works. I can maximize the edge to 850 by changing workspace to 2048. Does it have special meanings?
[image: pogo-vangogh-30-2-0 003-0 3-out-850] https://cloud.githubusercontent.com/assets/1690881/11582168/e4e1caca-99ff-11e5-8a30-7090b1b8e1b7.jpg
— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/664#issuecomment-161876700.
i'm going to close this issue now since the demo is already available at https://github.com/dmlc/mxnet/tree/master/example/neural-style
further work is definitely required to improve both the speed and the results.
I was wondering if anyone would be interested in helping me replicate the images from this paper? http://arxiv.org/abs/1508.06576
It looks like it's just a bunch of covnets, so we could possibly start with the pre-trained models and then try to reverse engineer the author's approach for combining images.
It'd make for a really, really cool mxnet demo =D