Generated Detective: A NaNoGenMo Comic

atduskgreg commented 9 years ago

Presenting Generated Detective #1:

http://gregborenstein.com/comics/generated_detective/1/

For NaNoGenMo this year, I'm working on generating comics. As a starting point, I pulled a bunch of old detective novels off of this list on Project Gutenberg. I wrote a script that searches the text of those books for short sentences that match a series of words. These words act as a kind of script to shape the story. For this first comic here's the script I used:

[:question, :murderer, :witness, :saw, :scene, :killer, :weapon, :clue, :accuse, :reveal]

So, I end up with a random short-ish sentence that matches each of those words.

Then I pull out interesting phrases from each of these sentences (currently by-hand). I ran these sentences through Flickr search to pull down one for each. Finally I run each of these images through a Processing sketch I wrote that attempts to reproduce the look of one of those "manga camera" apps. It breaks the image down into three colors: white, black, and gray. The gray areas it fills in with 45 degree lines. It also uses my OpenCV for Processing library to find contours in the image and add those on top for a hand-drawn line look. The sketch also renders the original sentence in a comic font and automatically scales a box around it as a caption.

Then I put these altogether into a webpage and you get the result you see above.

Moving forward, I'm thinking about how to automate more of this (right now it's a somewhat hand-held pile of hacks). I'm also wondering if I should stick with the detective theme or maybe experiment with some different genres. I think the noir-quality of the detective theme fits well with the way these images look. So maybe I'll stick with that and try to see how much variety/continuity/etc I can wedge into this process.

One feature I plan on adding is speech balloons. Since I'm already using OpenCV it should be pretty easy to do face detection and put the lines from the books into those if I detect a face. I'd also like to experiment with panel layouts other than this totally linear flow (though that can be hard because it's hard to control the dimensions of the images that come back from Flickr search).

Finally, a question here is: what counts as 50,000 words in a comic? Surely the images count for something too. A standard issue of a comic is 20-24 pages long. How many images does a web comic need to be equivalently long? Is there a panel count I'm going for?

atduskgreg commented 9 years ago

Just stuck the code here: https://github.com/atduskgreg/GeneratedDetective

It's a big ole mess at the moment.

lee2sman commented 9 years ago

this is really cool! nice work.

hugovk commented 9 years ago

Wow, looks great!

I think text panels at the bottom works really well with the style.

Another idea is to specifically pull images from Flickr Commons. Partly due to them being copyright-free, most are older photos so may suit a noir-style better. For example, here's results for policeman and veiled woman.

Finally, a question here is: what counts as 50,000 words in a comic? Surely the images count for something too. A standard issue of a comic is 20-24 pages long. How many images does a web comic need to be equivalently long? Is there a panel count I'm going for?

This has been asked elsewhere in some other issues. A jokey answer is: if a picture tells a thousand words, 50 images should do it. The more serious answer is: success and failure are really up to you, the participant.

lee2sman commented 9 years ago

Not the best answer, but i've been experimenting with converting images to ascii, so therefore you'd be able to run a wordcount.

enkiv2 commented 9 years ago

This is really cool!

On Thu Nov 06 2014 at 3:35:43 AM Lee T notifications@github.com wrote:

Not the best answer, but i've been experimenting with converting images to ascii, so therefore you'd be able to run a wordcount.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/70#issuecomment-61942388 .

atduskgreg commented 9 years ago

Thanks, everybody! @hugovk Flickr Commons is a good call, thanks for suggesting it.

atduskgreg commented 9 years ago

@lee2sman: I don't think that's quite the aesthetic I want to go for. However, I did once help someone use my OpenCV for Processing library to do exactly that: https://twitter.com/guan/status/451879480282603522/photo/1

MichaelPaulukonis commented 9 years ago

Detectives work well with comics. But you know what was also big?

Romance comics.

Although I shudder to think what an automated engine would produce from today's internet....

On a serious note, science fiction would work well. Not sure how the image-sourcing would fly.

The generated comic is amazing. There's a cartoonist this remind me of, but the name is escaping me right now.

atduskgreg commented 9 years ago

@MichaelPaulukonis: thanks! Yeah I thought of doing some other genres. Maybe I should mash up the detective text with space or romance texts. Generated Detective: To the Moon, Generated Detective: In Love, etc. @dariusk pointed out that the results kind of look like Edie Campbell's work from From Hell: http://www.comicsreporter.com/images/uploads/From_Hell_Chapter_2_Page_13.jpg I've tried to emulate that style even more exactly in the past: https://www.flickr.com/photos/unavoidablegrain/8501953252/in/photolist-dXhJuU-2Si24j maybe I should try that here...

ikarth commented 9 years ago

I have no idea how you'd manage to algorithmically digest them for input, but there are large databases of public domain comic books (including lots of romance and detective ones) availible online: http://digitalcomicmuseum.com/

Probably wouldn't work with the really neat photo processing you have going on, but might provide some inspiration.

lee2sman commented 9 years ago

@atduskgreg i'd love to see the code (couldn't find guap's repository/account on github). here's my video to ascii. it's a bit simpler, but only outputs images as png's, not txt files. would love to see an example of how to do that. i have a stranger version here: https://github.com/lee2sman/Selfie-Textual-Poem-Cam

atduskgreg commented 9 years ago

This was my tweak to Guan's code that generated text:

https://gist.github.com/atduskgreg/c0b901d5c9c4201b6985

I'm pretty sure all I changed there was some of the OpenCv image processing to get a better result rather than any of the text-generation which was all Guan's.

His code fed the generated text to an actual typewriter over serial (and presumably some amount of gadgetry on the other end to do the actual typing), but you can see how he's putting together the text. You'd just want to output it to a file instead.

lee2sman commented 9 years ago

thanks, i'll see if i can figure it out. appreciated. i got the (crazy?) idea of outputting a "novel" of ascii selfies for another NaNoGenMo contribution!

atduskgreg commented 9 years ago

Ok, generated a second "issue" of this:

http://gregborenstein.com/comics/generated_detective/2/

I tried what @hugovk suggested and used the Flickr Commons search for these. I feel like the result was better tied in together and more period but maybe a little bit more boring? There's just less dynamic and compositional range in the images.

Also, curious to hear what people think about the length? Are they maybe a little too long?

A friend compared the result to the Wild Palm comic strip which I'd never seen before but is totally a good comparison, I think (at least visually): http://www.spd.org/images/blog/Details%20Wild%20Palms.jpg

hugovk commented 9 years ago

I tried what @hugovk suggested and used the Flickr Commons search for these. I feel like the result was better tied in together and more period but maybe a little bit more boring? There's just less dynamic and compositional range in the images.

I agree with both points.

enkiv2 commented 9 years ago

You could do a lot worse than emulating Wild Palms. (Next step: generate the TV show instead of the comic)

On Fri Nov 07 2014 at 1:59:00 AM Hugo notifications@github.com wrote:

I feel like the result was better tied in together and more period but maybe a little bit more boring? There's just less dynamic and compositional range in the images.

I agree with both points.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/70#issuecomment-62106123 .

atduskgreg commented 9 years ago

Just generated Issue 3: http://gregborenstein.com/comics/generated_detective/3/

For this one I downloaded a bunch of sci-fi books from Gutenberg and gave my script generator a 50/50 chance of picking a line from scifi book or a detective book on each panel. I'm pretty excited about how this came out. It doesn't register as sci-fi at all but it added in some more macabre elements that picked up some striking imagery.

Also, I decided to make the comic a little shorter in response to some feedback I'd gotten from showing it to people in person. What do you guys think of this length? It feels a little thin and short to me, but maybe it's good to leave people wanting more.

atduskgreg commented 9 years ago

Oh! I almost forgot. I also added speech bubble generation for lines that come back as quotes. If I get a line that is a quote then I wrote code that looks for faces in the image that comes back and positions a speech bubble with the line in the right-ish spot relative to that. So that's something new as well.

enkiv2 commented 9 years ago

This is really cool. In fact, I'd read a webcomic like this even not knowing it was computer-generated.

On Fri Nov 07 2014 at 7:14:03 PM Greg Borenstein notifications@github.com wrote:

Oh! I almost forgot. I also added speech bubble generation for lines that come back as quotes. If I get a line that is a quote then I wrote code that looks for faces in the image that comes back and positions a speech bubble with the line in the right-ish spot relative to that. So that's something new as well.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/70#issuecomment-62234440 .

atduskgreg commented 9 years ago

Issue 4:

http://gregborenstein.com/comics/generated_detective/4/

Experimenting with a more sophisticated layout. This is a challenge as it involves cropping the images to fit the right width. It was pretty hand-held this go 'round but I have some thoughts about how to calculate it per-row in the future...

MichaelPaulukonis commented 9 years ago

wow. wow.

I also finally remembered -- this is reminding me of a black-and-white Glen Baxter.

Only more coherent. With a hint of Edward Gorey.

atduskgreg commented 9 years ago

Oh, Glenn Baxter's stuff is great. Thanks for pointing it out to me. I hadn't seen it before.

Just uploaded Issue 5:

http://gregborenstein.com/comics/generated_detective/5/

I'm starting to automate the panel layout process. But I probably won't have it fully done until next issue. The way I'm doing it is to make a tool where I give it a set of panel dimensions and a page width (and a margin size) and it figures out what height to scale all of the panels to so that they'll fit properly on one row of the page. I think I know how to do it in code so it'll just spit out the dimensions for each panel.

Anyway, I thought this one had a really nice mood. Look forward to hearing thoughts.

dariusk commented 9 years ago

What happens if you pick a subject for a whole page? So like, grab sentences that reference "knife" and then generate the page based on those. I think that might bring coherency to both the text and the images.

atduskgreg commented 9 years ago

@dariusk Oh that's an interesting thought. I'll try it for the next one. My instinct is that I'd have to work hard to make the images non-repetitive. Maybe if I used that same search term for all of the sentences but then varied the image search term somewhat from panel to panel...

atduskgreg commented 9 years ago

Ok issue 6 is up. This one has a title: The Knife

http://gregborenstein.com/comics/generated_detective/6/

I tried out @dariusk's idea of using a single word as the seed for each panel. In this case "knife", as suggested. I think the result is much more cohere, certainly.

Also, I wrote a Processing sketch to automate the process of scaling images to fill a row. You can see some screenshots from it here:

https://www.flickr.com/photos/unavoidablegrain/15581458427/

https://www.flickr.com/photos/unavoidablegrain/15766621855/

I load in the images that will share a row and it figures out the height to scale them to so they fit on the row. This saves a lot of time and inches me closer towards full automation.

enkiv2 commented 9 years ago

Do you yet have keyword isolation code? It seems like you could use TF-IDF on captions to determine what keyword to search for for images -- and, since captions are quite short, you can save time by just sorting by term frequency ascending and choosing the term in each caption that occurs least often in the corpus.

On Tue Nov 11 2014 at 12:54:21 PM Greg Borenstein notifications@github.com wrote:

Ok issue 6 is up. This one has a title: The Knife

http://gregborenstein.com/comics/generated_detective/6/

I tried out @dariusk https://github.com/dariusk's idea of using a single word as the seed for each panel. In this case "knife", as suggested. I think the result is much more cohere, certainly.

Also, I wrote a Processing sketch to automate the process of scaling images to fill a row. You can see some screenshots from it here:

https://www.flickr.com/photos/unavoidablegrain/15581458427/

https://www.flickr.com/photos/unavoidablegrain/15766621855/

I load in the images that will share a row and it figures out the height to scale them to so they fit on the row. This saves a lot of time and inches me closer towards full automation.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/70#issuecomment-62587636 .

atduskgreg commented 9 years ago

Yeah I haven't bothered with that stuff at this point and am doing it by hand. Just grabbing standout phrases or words from each line and putting them into flickr cc search and then picking the most striking of the first few results.

The technical bits aren't what's stopping me. Seeing the search results, if I just took the first result of most searches I'd get something totally random and often really generic. It's important to me to apply a little bit of my aesthetic judgment at that point.

Rather than a 100% generative process, I like to think of this as a generative exosuit that carries me through the aesthetic process, presents me with algorithmically gathered material, and asks me to make certain decisions at a few key points and then amplifies those decisions into the final work.

John Ohno wrote:

Do you yet have keyword isolation code? It seems like you could use TF-IDF on captions to determine what keyword to search for for images -- and, since captions are quite short, you can save time by just sorting by term frequency ascending and choosing the term in each caption that occurs least often in the corpus.

On Tue Nov 11 2014 at 12:54:21 PM Greg Borenstein notifications@github.com wrote:

Ok issue 6 is up. This one has a title: The Knife

http://gregborenstein.com/comics/generated_detective/6/

I tried out @dariusk https://github.com/dariusk's idea of using a single word as the seed for each panel. In this case "knife", as suggested. I think the result is much more cohere, certainly.

Also, I wrote a Processing sketch to automate the process of scaling images to fill a row. You can see some screenshots from it here:

https://www.flickr.com/photos/unavoidablegrain/15581458427/

https://www.flickr.com/photos/unavoidablegrain/15766621855/

I load in the images that will share a row and it figures out the height to scale them to so they fit on the row. This saves a lot of time and inches me closer towards full automation.

— Reply to this email directly or view it on GitHub

https://github.com/dariusk/NaNoGenMo-2014/issues/70#issuecomment-62587636 .

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/70#issuecomment-62588574.

MichaelPaulukonis commented 9 years ago

Rather than a 100% generative process, I like to think of this as a generative exosuit that carries me through the aesthetic process,

I'm sorry, but NaNoMechaMo is December.

....

I think I like the less-coherent pages better; then again, I like Glen Baxter.

If this becomes automated, let it have the ability to switch strategies, so there are different types of coherence.

atduskgreg commented 9 years ago

I'm sorry, but NaNoMechaMo is December.

Ha. But if their authors were truthful, all but the most extreme and conceptual of NaNoGenMo projects mix in a series of specific human aesthetic choices throughout. Does it make a really big difference if those choices congeal as specific if-statements in code that exclude or select for particular situations rather than explicit selections made by a person interactively? As in most things, I think it's important to be skeptical of purity narratives...

The question of coherence is a little bit of a different one. All of these pages have begun with a script which is a series of terms to grep for in the source books. Here's the relevant lines of code (with the scripts from previous days commented out and the current day's in the variable "keys"):

# issue 1 keys = [:question, :murderer, :witness, :saw, :scene, :killer, :weapon, :clue, :accuse, :reveal]
# issue 2 keys = [:detective, :woman, :lips, :shot, :chase, :fight, :body, :victim, :blood, :detective]
# issue 3 keys = [:moon, :murder, :discover, :she, :kill, :body]
# issue 4 keys = [:chill, :shadow, :body, :blood, :woman, :kill, :detective]
# issue 5 keys = [:hunt, :monster, :night, :woman, :murder, :body, :flee]

keys = [:knife, :knife, :knife, :knife, :knife]

On previous days, I'd been trying to create a kind of arc through the story by selecting search terms that would tell a story. @dariusk suggested sticking with one word to increase coherence and I think that worked here somewhat.

The thing I'm thinking about now is if there's an iterative way to do this: start with one search term, see what sentence you get back, process it, then use that to select the search term for the next sentence, etc. so that there would be a connected thread through all the panels without them being quite so single-minded.

MichaelPaulukonis commented 9 years ago

Hrm. Somebody in here did a google search and kept using it's auto-complete suggestions: https://github.com/edsu/google-the-poem/blob/master/run.js

wordnik synomyms?

jugglinmike commented 9 years ago

@atduskgreg Darius just tipped me off to your work--it's really great! I like the work you've done to add variety to the framing. One thing that comes to mind that might make these pages even more dynamic is to extend one of the frames as the page background, with others overlaid. This will likely require identifying the key areas of interest within source images, but it might be worth the effort. Either way, nice work so far

atduskgreg commented 9 years ago

@jugglinmike Thanks for the nice words about this project! The idea of doing layouts with a background image (or a panel that bleeds out of its box is really interesting). I've done work with identifying the free/busy parts of an image before (a good starting place is to calculate the gradient of the image and look for places with low energy gradients). Thinking about this I also just tried setting the background color of one of the pages to black (or dark gray) and it looks pretty good that way too, increased the mood. Seems like that would be necessary to do a full-bleed type image, too because or else you'd get this awkward edge to it.

Anyway, thanks for the ideas...

ikarth commented 9 years ago

Ha. But if their authors were truthful, all but the most extreme and conceptual of NaNoGenMo projects mix in a series of specific human aesthetic choices throughout. Does it make a really big difference if those choices congeal as specific if-statements in code that exclude or select for particular situations rather than explicit selections made by a person interactively? As in most things, I think it's important to be skeptical of purity narratives...

My position is that all of these projects involve human aesthetic choices. Sometimes that's in the selection of input. Sometimes it's in arranging the output. Sometimes it is in how the system in the middle is designed. Usually a mixture of all three. Even some kind of hypothetical genetic-learning system that teachers itself to write involves an aesthetic choice on the part of its creator.

The valuable thing, to my mind, of having a completely algorithmic process is that it's easy to recreate the process exactly. The active expression gets pushed up the chain of causation, as it were, moving from the finished material result to the formal system. This lets variations be produced automatically, which in turn allows the generative process to be connected with other generative processes. Which also starts to get into the aesthetics of systems, a territory that I'm personally interested in.

That said, there is a long history of art generated through following algorithmic steps, with more or less individual variation from the artist. So I think it fits here just fine. The code just has a squishy step that involves inserting a human as glue. (And I'd encourage you to document that step thoroughly, kind of like comments for your own actions for when you look back at it later.)

Hrm. Somebody in here did a google search and kept using it's auto-complete suggestions: https://github.com/edsu/google-the-poem/blob/master/run.js

wordnik synomyms?

What about Wikipedia article links? Crime-novel summaries? 19th century true-crime tabloid headlines? Pulp magazine table of contents? Randomly sampling from a corpus of thematic words (possibly the most unique words culled from your source novels)?

cpressey commented 9 years ago

Interesting topic re automation. My own position is not fully formed (and it is late where I am so I shall not try to form it more fully tonight) but I can refer to an earlier comment I posted for some of my thoughts on it.

ikarth commented 9 years ago

It occurs to me that you can write an unremarkable generator which generates a remarkable novel, or a remarkable generator which generates an unremarkable novel. Or both or neither of course, but what I'm getting at is, you can concentrate your efforts on either side.

Is a program that generates a single novel still a novel-generator? If, as the rulebook suggests, a script that downloads a novel from Project Gutenberg and spits it out is a novel generator, then, yes it is.

@cpressey And, of course, you can swing the other way and have a generator that spits out a writing prompt. The machine provides the structure, while you provide the actual text. Which is a very Oulipian approach. That's actually an illuminating way to approach Generated Detective, comparing it to Life A User's Manual and other forms of constrained writing (constrained imaging?).

Another idea I don't have time for at the moment: A plot/idea generator patterned after Italo Calvino's "Prose and Anticombinatorics" or Georges Perec's "machine for inspiring stories", generating a dense pattern for the human writer to follow.

cpressey commented 9 years ago

Ha. But if their authors were truthful, all but the most extreme and conceptual of NaNoGenMo projects mix in a series of specific human aesthetic choices throughout. Does it make a really big difference if those choices congeal as specific if-statements in code that exclude or select for particular situations rather than explicit selections made by a person interactively? As in most things, I think it's important to be skeptical of purity narratives...

Don't want to be unthruthfully layin' a heavy purity-narrative trip on ya @atduskgreg, but I think you're conflating two things here, namely "specific aesthetic choices" and "explicit interactive selections". (I take "interactive" to be essentially synonymous with "manual" -- correct me if that is not how you meant it.)

The two concepts seem orthogonal to me: I can make a specific aesthetic choice and automate it (write code for it). Or I can write code without any specific aesthetic basis in mind. Or I can make a specific aesthetic choice and execute it manually. Or I can perform some manual transformation with no specific aesthetic basis in mind.

Maybe, as an illustration, say...

Artist A makes a bunch of specific aesthetic decisions, spends 4 hours writing code for them, and pushes a button. Program runs, out comes an HTML file. They then open it in a web browser and print it to PDF (a specific aesthetic choice, but not a greatly significant one.)

Artist B runs shuf /usr/share/dict/words | head -n 50000 > novel.txt (a specific aesthetic choice, but not a greatly significant one), then opens novel.txt in their text editor. They then spend the next 4 hours rearranging the words, making many specific aesthetic choices (and pushing many buttons) as they go.

Meanwhile, artists C to Y (inclusive) are doing things along the spectrum in between A and B, and artist Z, who thinks they're all nuts, has just handed a rifle to his friend, who promptly shoots Z in the arm.

Where does NaNoGenMo end and NaNoWriMo (or anything else) begin? I won't try to give an answer to that; that's like trying to give an answer to "What is art?"

enkiv2 commented 9 years ago

Certainly there's a range of control we can apply to the aesthetics of the result, and that control can be exerted at a number of places in the process. Most historical use of writing machines (oulipo style constraints, Burroughs/Gysin cutups) make use of the mechanism first and the aesthetics of the author are applied on the result (in the case of oulipo prompts, the author might choose individual words or word order, depending upon the constraint; in the case of cutups, the author acts as editor by choosing which parts of the cutup to relegate to the slush pile). Some (exquisite corpse, for instance) make use of the writing machine concurrent with aesthetic choices. But, even the most automated entry here still takes advantage of author aesthetics in a final step -- we run the script, look at the output, and decide whether or not to keep it or run it again. Additionally, the choice of machinery of constraint is an aesthetic choice that occurs before the use of any machinery (to echo Chris Pressey's point) -- a cutup will not resemble exquisite corpse, regardless of subsequent editorial work.

It seems like a big question hanging over NaNoGenMo is: how automated can you make a thing while still keeping it interesting. This comic is one of the most actually readable entries, in my opinion, and I suspect that it could be made more automated without sacrificing readability (although I could certainly be wrong). Another big question, this year, seems to be: what kind of media can you mix in and automate without sacrificing whatever coherence and interest you could otherwise produce. And, in terms of being a mix of media with some automation involved, this is also a success. We could quantify how much of the aesthetic coherence of the work is the product of the process and how much is the product of the author by removing the author from the loop.

On Wed Nov 12 2014 at 6:59:39 AM Chris Pressey notifications@github.com wrote:

Ha. But if their authors were truthful, all but the most extreme and conceptual of NaNoGenMo projects mix in a series of specific human aesthetic choices throughout. Does it make a really big difference if those choices congeal as specific if-statements in code that exclude or select for particular situations rather than explicit selections made by a person interactively? As in most things, I think it's important to be skeptical of purity narratives...

Don't want to be unthruthfully layin' a heavy purity-narrative trip on ya @atduskgreg https://github.com/atduskgreg, but I think you're conflating two things here, namely "specific aesthetic choices" and "explicit interactive selections". (I take "interactive" to be essentially synonymous with "manual" -- correct me if that is not how you meant it.)

The two concepts seem orthogonal to me: I can make a specific aesthetic choice and automate it (write code for it). Or I can write code without any specific aesthetic basis in mind. Or I can make a specific aesthetic choice and execute it manually. Or I can perform some manual transformation with no specific aesthetic basis in mind.

Maybe, as an illustration, say...

Artist A makes a bunch of specific aesthetic decisions, spends 4 hours writing code for them, and pushes a button. Program runs, out comes an HTML file. They then open it in a web browser and print it to PDF (a specific aesthetic choice, but not a greatly significant one.)

Artist B runs shuf /usr/share/dict/words | head -n 50000 > novel.txt (a specific aesthetic choice, but not a greatly significant one), then opens novel.txt in their text editor. They then spend the next 4 hours rearranging the words, making many specific aesthetic choices (and pushing many buttons) as they go.

Meanwhile, artists C to Y (inclusive) are doing things along the spectrum in between A and B, and artist Z, who thinks they're all nuts, has just handed a rifle to his friend, who promptly shoots Z in the arm.

Where does NaNoGenMo end and NaNoWriMo (or anything else) begin? I won't try to give an answer to that; that's like trying to give an answer to "What is art?"

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/70#issuecomment-62708125 .

atduskgreg commented 9 years ago

Thanks for the great discussion everyone. @cpressey I think we're mainly on the same page. Rather than only focusing on code, I'd look at it as places where the system constrains the human's aesthetic choices vs. where it executes them. So, for example, the system I'm using in Generated Detective works a bit like this (explained in a general form):

Seed it with the formula for a story (either as a series of keywords or some other procedure)
Use this seed to select a series of short sentences out of the corpus of detective (and scifi and romance) novels
Each of these these sentences gets translated into a comics panel by:
Finding a relevant image
Processing that image to look like a comic
Adding the original text either as a caption or balloon
Combine the panels in sequence in a comic layout

There are a series of specific aesthetic choices in my having formulated the system in this way: the core of comics storytelling is 1) the parallel juxtaposition of words and images 2) the sequential juxtaposition of panels in an order. What's important to me is that I've turned both of these processes over (to a greater or lesser degree) to the system not necessarily to my code.

My code consists both of systems that constrain my aesthetic choices as well as those that execute them. An example of the former would be the process for building the script by selecting sentences out of the corpus of texts. That is the system by which the sequence of the comic is created. So far I've experimented with two systems for it: inputting a script consisting of a series of search terms and executing one search term over and over again. I'm excited to experiment with more. This is a place where my choices are significantly constrained by the system -- I can try to wrestle it into something the approaches a more linear narrative or I could make the selection fully random and see what emerges -- and it has the capacity to surprise me significantly with its results.

On the other hand, I've also written a bunch of code (probably more lines of code than the previous system) that takes the script lines and formats them as a caption or speech bubble and then does the design and typography to make sure they fit in the panel, have the right spacing, etc. That code is an example of executing a set of choices that I made. I decided what I wanted the captions and bubbles to look like and then I wrote code that achieves that look from whatever input is given it. And, as the code is still imperfect and the problem consists of zillions of edge cases, there are areas where I have to intervene: say to make sure a speech bubble doesn't end up half off-panel because of the location of a detected face. To me, this kind of intervention doesn't deviate from the system, but just implement it. It's a choice about how I'm going to spend my programming time on this project rather than a deeply aesthetic (or philosophical one). I could automate that process to ensure speech balloons automatically showed up on-panel but it would likely take a lot more time than simply tweaking it from time to time manually when it goes wrong. But I'm pretty positive that no one (myself included) would be able to tell the difference in the output. There's an important way here where the system is making the decisions and it doesn't matter if I execute them by rote or if the code does.

Now the selection of images from each script line is a more complex and interesting case. The point that started this discussion was the selection of search terms from the script lines using TF-IDF. Right now I do that "manually" but the "algorithm" I use to pick the terms is almost exactly that. I've used TF-IDF before and seen the output of it a ton and I'm mentally using it to select the search terms. So, to me, that is pretty similar to the balloon layout code.

But selecting which image to use from the image search is definitely a place where I'm intervening more. I look at the first few images returned by the search and choose the one that I think will look the coolest. Now is this something I could automate? Certainly. In fact, I've done professional work before using computer vision to rate photographs aesthetically by analyzing their exposure qualities, their composition, their focus, etc. And that's a direction I could imagine going with this project: writing a bunch of image processing code that would select images from the search results based on that criteria.

Part of why I haven't done that yet is that I'm also applying an additional criteria here: how will the image juxtapose with the line of the script that triggered the search. And I don't have a clear idea how to automate that choice. The natural way for an algorithm to do that would be purely randomly. And maybe that's worth trying. I'm curious as to have different the results would be if I did that. I should try it for one of the upcoming issues. Given the search results I've seen thus far, though, I'm pretty sure that this approach would lead to something very generic. It would make this selection process a reflection of Flickr's search algorithm more than anything else. And all the things I can think of to do in code that are more sophisticated than this are long term research projects: analyze the composition of each candidate and score it based on how it matches with the already-selected images, use some combination of object detectors and scene classifiers to produce a semantic representation of the image and then select based on the relationship of that representation to the script line, building up a collage by extracting individual elements from multiple images based on different words in the panel, etc. Those are things I'd love to do (and may well work on in the future) but were beyond the scope of this project.

And I've learned a ton about the possibilities and constraints of those ideas already by having hand-selected the images thus far. If I had just gone with random selection (being the easiest to implement) I would never have understood the problem this well. And if I'd waited until I had the ability to algorithmically implement a richer idea, I'd have never made the project. In fact, I was hesitating from starting it because I was paralyzed by these kinds of decisions before @dariusk encouraged me to just plunge ahead with the simplest thing that could possibly work.

Anyway, I hope that explains some of my thinking here. I'm really enjoying this discussion and look forward to continuing it.

atduskgreg commented 9 years ago

Here's issue 7:

http://gregborenstein.com/comics/generated_detective/7/

For this one, I tried out some of what we've been discussing here. It's the most "fully generative" issue so far. The search term for all the panels is "woman" and I wrote code that searches flickr and automatically downloads a random image from the search results to act as the source.

Maybe a coincidence, but I think this is the weakest issue so far by a significant margin. And I think a big reason for that is the quality of the images returned. An element there may be the search terms themselves (i.e. because of the sentences that came back, they lacked words like "knife" and "blood" and "chase" that return striking images), but I can't help but think that it also has to do with the random image selection.

But to test this hypothesis, I think I'm going to focus on script generation for the next issue. The key issues to balance seem to be: 1) including vivid words 2) creating some sense of connection or contrast between the sentences.

MichaelPaulukonis commented 9 years ago

It may not be as "striking" but it's far from weak. En entire book where every page is cranked up to 11 would be exhausting.

cymatiste commented 9 years ago

This project could be a big hit at comicons; have a table set up with a printer, and each person can get their own completely custom comic on a theme of their choosing! Could even be a kiosk.

On Wed, Nov 12, 2014 at 3:55 PM, Michael Paulukonis < notifications@github.com> wrote:

It may not be as "striking" but it's far from weak. En entire book where every page is cranked up to 11 would be exhausting.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/70#issuecomment-62791644 .

hugovk commented 9 years ago

For this one, I tried out some of what we've been discussing here. It's the most "fully generative" issue so far. The search term for all the panels is "woman" and I wrote code that searches flickr and automatically downloads a random image from the search results to act as the source.

If you're not already, you could have Flickr return photos sorted by "interestingness". Then pick the top one or a random one from the top selection. See the sort argument. And you could always search for more than one term (e.g. knife man).

atduskgreg commented 9 years ago

Issue 8 is up:

http://gregborenstein.com/comics/generated_detective/8/

Change to the algorithm this time: I pulled out the sentences for the script fully randomly. So now this is a version that's completely algorithmic, without any intervention on my side that couldn't be automated with more (very straightforward) programming:

Script is generated from randomly selected sentences
If script line has an em-dash or a semi-colon split it into two queries
Select a query word or phrase from each script line
Run it against Flickr search and pick a random result
If two images were selected from a single sentence, calculate the size to scale them to share a row
Do the image processing for the "manga camera" look
Add a caption or a speech bubble (the bubble happens if the sentence has quotation marks in it)

One interesting thing that happened this time was that it picked a famous public domain image (that Mohammed Ali punch).

I definitely think this baseline fully automated version produces results that feel a more diffuse and less narrative than the earlier ones where I gave it a script.

Now that I've reached this baseline, though, I want to start experimenting with ways of producing an interesting script to drive this process.

@hugovk The problem with Interestingness is two-fold: 1) I'm specifically limiting my search to photos with a permissive license for remixing (i.e. CC and public domain) and nearly none of those show up in interestingness (which tend to be by professional photographers) so when I add that criteria to the search I get back 0 results for a lot of my queries. 2) The photos that show up under interestingness tend to be individual objects or people isolated from a background because of nice shallow depth of field close up. Their title are always either the name of the object/person or some kind of very poetic thing. So when they do match a query they're either the dullest thing in the world (just the object named sitting there) or almost completely random seeming.

But yeah I could try it as a sort rather than a filter. It might bring the aesthetically interesting results to the top a bit more (and especially on my weird queries have a hard time reducing it to the genericness of a lot of the top interestingess photos).

hugovk commented 9 years ago

Yes, I definitely meant a sort rather than a filter.

Interestingness isn't a subset of all photos but an algorithmic way of measuring a photo, based on things like views and favourites. So the results can be sorted by this value, rather than, say, recent. And that shouldn't affect the number or actual results returned, just their ordering.

(Flickr's Explore section is powered by interestingness, and the ones there are usually more pro/unremixable. I guess you were somehow getting results from this section.)

lilinx commented 9 years ago

@atduskgreg this generator is awesome, I feel jealous. I've been trying to do comic generation through image processing and repeatedly failed to make anything consistent. This one is so great and so inspiring.

atduskgreg commented 9 years ago

@lilinx Thanks!

atduskgreg commented 9 years ago

Issue 9 is up:

http://gregborenstein.com/comics/generated_detective/9/

Script here was "death, death, death, death, death". Mostly warming back up after a break due to travel over the weekend.

atduskgreg commented 9 years ago

Issue 10 is up:

http://gregborenstein.com/comics/generated_detective/10/

Script was "love, love, love, love, love" and the first panel has a really funny accidental joke about Zebras.

MichaelPaulukonis commented 9 years ago

These are fantastic!

lilinx commented 9 years ago

Dude you work is so awesome it's reshaping my inner world to some extent

2014-11-19 5:26 GMT+01:00 Michael Paulukonis notifications@github.com:

These are fantastic!

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/70#issuecomment-63590613 .

dariusk / NaNoGenMo-2014

Generated Detective: A NaNoGenMo Comic #70