gabrielelanaro / ml-prototypes

A repository of machine learning prototypes
MIT License
10 stars 6 forks source link

Implement style transfer on video #37

Open FraPochetti opened 5 years ago

FraPochetti commented 5 years ago

Deepart.io Paper GitHub repo

How can we get this done in a reasonable time/effort?

Clear issues:

  1. it takes 50 minutes to process 17 seconds, i.e. 540 frames (50 iterations of the optimization algo). Quite slow. It could be costly to run this kind of model in production even for relatively cheap GPUs.
  2. a 6 MB original video turned into 120 MB

Easy potential fixes:

  1. Use L-BFGS instead of SGD. This has nothing to do with this video challenge per se but it could help us get better results at a frame level.
  2. Use VGG16 as it is lighter and faster than 19.
  3. We could change the loss function to have a third term (in addition to content and style which are already there). This term would be the MSE between frame i we are updating and frame i-1 we have already style-transferred (NOT frame i-1 content only!). This would encourage the model to restore similar pixel values in frame i compared to the previous frame.
  4. We could apply this concept to a moving average. I.e. the third term mentioned at point 3 does not need to be related to frame i-1 only. With a rate of 30 fps, we could calculate the average of the pixel values in a 15-frame window of already style-transferred frames and use that instead.
  5. Depending on how hard it is to calculate the flow motion (unclear to me from the paper), we could try implementing that. Maybe just use section 4.1 from the paper. Section 4.2 seems already too complex to me.
  6. The rest of the paper seems a moon-shot to me. I think the marginal benefit we would get from implementing all of it from scratch would be minimal compared to more baseline methods. People don't really care. This is IMO, of course.
gabrielelanaro commented 5 years ago

I took a look at the comments and at the paper, and it seems to me that the approach is pretty hard to implement I think (just look at the number of options these guys have). So it may not be justified to spend so much time with other approach and I favor something simpler.

But I think it's worth researching if it exists a better way to do it anyway, for example this may be more feasible:

https://medium.com/coinmonks/real-time-video-style-transfer-fast-accurate-and-temporally-consistent-863a175e06dc

https://www.pyimagesearch.com/2018/08/27/neural-style-transfer-with-opencv/

FraPochetti commented 5 years ago

The PyImageSearch post (using these guys' work) is quite insane. It is fantastic at frame level but at video level, it is a for loop, so nothing better than what we get, just faster.

Haven't seen the Medium post yet.

gabrielelanaro commented 5 years ago

For the pyinagesearch it looks to me the frames are quite stable (it's the same technique of the videogame demo I think?) probably because there is no optimization being performed (it's what makes it random). Which makes me think doing an initialization on the previous frame could technically help (and also the loss)

On Sat., Apr. 27, 2019, 3:21 a.m. Francesco Pochetti, < notifications@github.com> wrote:

The PyImageSearch https://www.pyimagesearch.com/2018/08/27/neural-style-transfer-with-opencv/ post (using these guys' work https://github.com/jcjohnson/fast-neural-style) is quite insane. It is fantastic at frame level but at video level, it is a for loop, so nothing better than what we get, just faster.

Haven't seen the Medium post yet.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gabrielelanaro/ml-prototypes/issues/37#issuecomment-487273884, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB2SA64QSAU5ZJ3FMNBZ3TPSQSJTANCNFSM4HIUIV7Q .