NaNoGenMo / 2016

National Novel Generation Month, 2016 edition.
https://nanogenmo.github.io
162 stars 7 forks source link

AI AI #44

Open barnoid opened 7 years ago

barnoid commented 7 years ago

I have an idea to feed 5000 frames of the movie AI to an image captioning neural net and see what comes out. I think 5000 should give at least 50000 words. I may put in some randomish paragraph breaks and simulate chapters somehow. This might take longer than a month to run though.

superMDguy commented 7 years ago

Wow that sounds cool. Good idea.

barnoid commented 7 years ago

I've ripped the DVD and extracted 5036 frames (0.6 frames per second). I've started a run through the neural net (I'm using https://github.com/karpathy/neuraltalk2), it looks like it will take three or four hours. The output is looking quite unvaried so I may have to intervene a lot to make it not extremely boring.

hugovk commented 7 years ago

I saw this demo https://twitter.com/kcimc/status/668094003791929344 of https://github.com/karpathy/neuraltalk2 when it was released last November and thought it'd be great for nanogenmo!

pointyointment commented 7 years ago

Perhaps segmenting each frame according to motion (by comparing with adjacent frames) prior to running the neural net would help with getting more interesting output.

superMDguy commented 7 years ago

This would be a lot of work, but I wonder if you could modify the neural net, or maybe just write some extra code to compare two descriptions and focus on the differences between frames, so you could talk more about movement.

barnoid commented 7 years ago

It is done.

https://github.com/barnoid/AIAI/blob/master/aiai.pdf

Write up here: https://github.com/barnoid/AIAI

hugovk commented 7 years ago

A preview:

A regular view of a nighttime city landscape. A picture of a stately couple in the open distance, and a view of a room with a window and a window. A room with a bed and a table. A view of a building with a large clock on the side of it; a close up of a banana on a table. A stop sign with a sticker on it; a red and white sign with a sky background. A large red stop sign sitting in the center of the street.

A red and white sign with a sky background. Clean cars is seen while strange the brown and green jelly next to them.

superMDguy commented 7 years ago

I recently saw this, which is a neural network trained specifically to describe what's happening in videos. I haven't tried it out myself, but it seems pretty interesting. I don't know if it's generalized enough to work in something as broad as a movie.