apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

A Neural Algorithm of Artistic Style #664

Closed zachmayer closed 8 years ago

zachmayer commented 8 years ago

I was wondering if anyone would be interested in helping me replicate the images from this paper? http://arxiv.org/abs/1508.06576

It looks like it's just a bunch of covnets, so we could possibly start with the pre-trained models and then try to reverse engineer the author's approach for combining images.

It'd make for a really, really cool mxnet demo =D

mli commented 8 years ago

that's absolutely awesome! we can also setup a website so others can try it easily

zachmayer commented 8 years ago

see also http://www.deepart.io/

mli commented 8 years ago

exactly. i think we can speedup the prediction time a lot comparing them:)

zachmayer commented 8 years ago

Agreed! I also want to try it out locally =D

mli commented 8 years ago

sure. please ping me at anytime if you have problem to get it run

tqchen commented 8 years ago

For web-service, related to #629

zachmayer commented 8 years ago

@mli I haven't even started yet, so any suggestion you have would be appreciated. I'm try to figure out how to begin: should I start with a pre-trained network, or should I somehow train a network on a pair of images alone?

mli commented 8 years ago

i guess we can first reuse the pretrained caffe models https://github.com/jcjohnson/neural-style/tree/master/models (mxnet can load caffe models)

then try to translate their lua codes to mxnet

we may consider train the network later. we already have vgg model training codes there https://github.com/dmlc/mxnet/tree/master/example/image-classification

phunterlau commented 8 years ago

Training the model needs much time and tune. I have tried the Lua version of https://github.com/jcjohnson/neural-style/tree/master/models for producing neural style pictures, and it worked nicely. The lua version loads the caffe model, so mxnet can load the same model too from model.sh and simply replicate the producing script from Lua to python with mxnet.

P.S. since mxnet's unique advantage is memory saving, if later on we can have an mxnet version with slightly simplified network for speed boost, we may be able to shorten the producing time within 1 min. Currently it needs about 2-3 min of producing a 512px image from a GTX 960 using 3 GB memory, if 1024px, my poor GTX 960 goes die. If we can beat Lua version in memory, we can at least produces large images.

Pogo with oil painting style

antinucleon commented 8 years ago

The example looks cool. I think we can do it on Inception instead of VGG

phunterlau commented 8 years ago

is inception faster than vgg? vgg model is so slow as 2min on a titan x, which prevents it from being production

winstywang commented 8 years ago

I would like to do it, but the priority is not that high so far.

pluskid commented 8 years ago

@winstywang I'm recently taking a painting class, one of the project is to apply some specific artistic style to paint a photograph. If you implemented this, I should show it to the instructor! Haha! :laughing:

antinucleon commented 8 years ago

I am working on it... First I will repeat torch experiment, then I will replace VGG.

pluskid commented 8 years ago

@antinucleon wondering why people use VGG so much despite it being much bigger and slower than google's model. The recently announced neuraltalk2 also uses VGG for example. Would be very interesting if we could show here that the huge VGG model could be replaced.

antinucleon commented 8 years ago

download

I have done most parts, but seems there are minor issues need to be fixed.

phunterlau commented 8 years ago

superb, what is the speed? i can start drafting a blog of advertising it

zachmayer commented 8 years ago

Someone made a demo for tensorflow too: https://github.com/woodrush/neural-art-tf

phunterlau commented 8 years ago

Saw the tensor flow demo too. They use the exact same caffe model as in the Lua version. Seems like the number of iteration is not large enough for understanding the style. It should be around 800-100 from the Lua version, which is the reason why take so long (2 min with Titan X). If mxnet can simply this iteration for searching for optimal parameters, mxnet can save much time. The lua version mentioned their -optimizer parameter in the README https://github.com/jcjohnson/neural-style

zachmayer commented 8 years ago

I'm excited to see the results! Also, there's already another one: https://github.com/anishathalye/neural-style

tensor flow has some great marketing...

mli commented 8 years ago

out16

we can almost reproduce the results, thanks to @antinucleon

but the major problem is the speed. it needs thousands of iterations which takes minutes on a decent gpu cards..

zachmayer commented 8 years ago

@mli Is there code somewhere to reproduce the results I could try out myself?

antinucleon commented 8 years ago

@zachmayer I will commit it soon :)

zachmayer commented 8 years ago

@antinucleon Really looking forward to it!

antinucleon commented 8 years ago

https://github.com/dmlc/mxnet/tree/master/example/neural-style

phunterlau commented 8 years ago

so cool, how is the speed? let me write a blog for advertising it.

mli commented 8 years ago

not well done yet. @antinucleon is still on training a smaller model to make it fast

zachmayer commented 8 years ago

I tried the example, and got Please compile with CUDA enabled. Is it silly to try to run this example without cuda? I don't need a ton of speed and want to try it on a device without a gpu.

mli commented 8 years ago

use --gpu -1 to disable gpu On Sat, Nov 28, 2015 at 10:39 AM Zach Mayer notifications@github.com wrote:

I tried the example, and got Please compile with CUDA enabled. Is it silly to try to run this example without cuda? I don't need a ton of speed and want to try it on a device without a gpu.

— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/664#issuecomment-160312191.

phunterlau commented 8 years ago

Just an estimation that the Lua version took about 40-60 minutes on CPU and 2 min on GPU, so we may expect 20-30 min on CPU with mxnet

phunterlau commented 8 years ago

just had some quick benchmark plus comparison with Lua version, as well as some tricks for better results.

  1. Memory: with the same image size of 512px, Lua version takes 2868MB memory while mxnet takes 1498MB. It makes mxnet unique advantages that, laptop users can try neural art because most laptop has <2GB GPU memory, also mxnet can train larger pictures on my poor man's GTX 960 4 GB, where 800px can easily core-dump the Lua version.
  2. speed: Lua version took 232 seconds on my poor man's GTX 960 (1000 iterations) while mxnet takes 93 seconds for converging into a similar result in 200 iterations. Not sure if there will be a simplified model for less time. Also, mxnet loads a much smaller model file, although it is the same caffe model.
  3. tricks of having better outputs. the default denoise factor remove-noise is too large which makes the picture looks blur, a better value is about 0.15 instead of 0.2 . stop-eps value needs to be a little bit smaller for a stronger style, e.g. 0.004 or 0.003, but it may take longer time to converge. Set content-weight to 20 instead of 10 keeps more content shape, otherwise the style is so strong that distort the original content.

Here it is the same cat pogo with Van Gogh:

pogo-vangogh-mxnet
antinucleon commented 8 years ago

@phunterlau Guess we can make a cool blog about your finding and post in reddit or other place. For cuDNN bug, I still don't have any idea yet.

antinucleon commented 8 years ago

If switch to CuDNN, we can make at least 40% faster. So think speed won't be an issue.

phunterlau commented 8 years ago

most people don't have CuDNN and they just want to have a try without pain from installing Lua, so for a blog post (if mxnet committee agrees for a blog, I can start drafting one), CuDNN is not needed. As to the ultimate solution, CuDNN is worth doing, considering a GTX 980 can reach 50s, so a Titan X with CuDNN can make it 20 s? It is an significant improvement.

If it is OK with a blog now, I can start writing.

phunterlau commented 8 years ago

For people who want a CPU only benchmark, here it is: on clang-omp CPU only with 2 cores from an i7 macbook pro, for having the same result, it takes 39 mins total. Will a faster learning rate on lr will help? current it is 0.1. I am going to experiment more with GPU anyway.

mli commented 8 years ago

@phunterlau i found the trick is to use a figure as large as possible. for example, i updated the sample output by change to --max-long-edge 900, it looks much better

phunterlau commented 8 years ago

@mli then how much memory does it take? I only have poor man 4gb

mli commented 8 years ago

3.7gb;)

On Wed, Dec 2, 2015 at 10:27 PM, Hongliang Liu notifications@github.com wrote:

@mli https://github.com/mli then how much memory does it take? I only have poor man 4gb

— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/664#issuecomment-161505238.

phunterlau commented 8 years ago

sound like my poor GTX 960 problem: when I have max-long-edge greater than 700, it gives:

INFO:root:load the content image, size = (1280, 960)
INFO:root:resize the content image to (780, 585)
[22:42:54] ./dmlc-core/include/dmlc/logging.h:208: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param_.workspace) >= (required_size)
Minimum workspace size: 1168128000 Bytes
Given: 1073741824 Bytes
[22:42:54] ./dmlc-core/include/dmlc/logging.h:208: [22:42:54] src/engine/./threaded_engine.h:295: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param_.workspace) >= (required_size)
Minimum workspace size: 1168128000 Bytes
Given: 1073741824 Bytes
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPEto NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
terminate called after throwing an instance of 'dmlc::Error'
  what():  [22:42:54] src/engine/./threaded_engine.h:295: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param_.workspace) >= (required_size)
Minimum workspace size: 1168128000 Bytes
Given: 1073741824 Bytes
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPEto NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
Command terminated by signal 6
mli commented 8 years ago

you can replace workspace=1024 in model_vgg19.py into workspace=2048

On Thu, Dec 3, 2015 at 1:45 AM, Hongliang Liu notifications@github.com wrote:

sound like my poor GTX 960 problem: when I have max-long-edge greater than 700, it gives:

INFO:root:load the content image, size = (1280, 960) INFO:root:resize the content image to (780, 585) [22:42:54] ./dmlc-core/include/dmlc/logging.h:208: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param_.workspace) >= (required_size) Minimum workspace size: 1168128000 Bytes Given: 1073741824 Bytes [22:42:54] ./dmlc-core/include/dmlc/logging.h:208: [22:42:54] src/engine/./threadedengine.h:295: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param.workspace) >= (required_size) Minimum workspace size: 1168128000 Bytes Given: 1073741824 Bytes An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPEto NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging. terminate called after throwing an instance of 'dmlc::Error' what(): [22:42:54] src/engine/./threadedengine.h:295: [22:42:54] src/operator/./convolution-inl.h:258: Check failed: (param.workspace) >= (required_size) Minimum workspace size: 1168128000 Bytes Given: 1073741824 Bytes An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPEto NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging. Command terminated by signal 6

— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/664#issuecomment-161533404.

phunterlau commented 8 years ago

That works. I can maximize the edge to 850 by changing workspace to 2048. Does it have special meanings?

pogo-vangogh-30-2-0 003-0 3-out-850

mli commented 8 years ago

it increases the temp buffer size, or the working space size

On Thu, Dec 3, 2015 at 11:53 PM, Hongliang Liu notifications@github.com wrote:

That works. I can maximize the edge to 850 by changing workspace to 2048. Does it have special meanings?

[image: pogo-vangogh-30-2-0 003-0 3-out-850] https://cloud.githubusercontent.com/assets/1690881/11582168/e4e1caca-99ff-11e5-8a30-7090b1b8e1b7.jpg

— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/664#issuecomment-161876700.

mli commented 8 years ago

i'm going to close this issue now since the demo is already available at https://github.com/dmlc/mxnet/tree/master/example/neural-style

further work is definitely required to improve both the speed and the results.