Predict The Sequence of Dialog in Comic Books

ghost commented 4 years ago

@alexmonti19 Thanks for your hard work!

Take this example, i want to predict the order of the dialog in a comic book, basically telling which dialog box is 1st, which is 2nd, 3rd, etc...

Can your code be trained to solve such task. wont the dialog boxes all look the same to it
Can it be trained to predict 40+ dialog boxes in a single page. and at what computation cost.
Can it generalize on unseen images. Will it really understand the visual context since most dialog boxes look the same.

Note that i already have the dialog boxes location and coordinated detected, but now i only want to predict their reading-order. demo

alexmonti19 commented 4 years ago

Hi @deepseek, thanks for your interest in our work :)

Correct me if I'm wrong, but the task you're proposing seems to me conceptually different from the aim of our architecture. You want to assign a numeric label to each box given the boxes coordinates, their content and maybe some visual features, while our architecture learns to generate plausible brand new examples (in the very same space of the input) learning the distribution that underlies the data.

Surely your task can be solved by employing RNNs and (possibly) Attentive or Graph Neural Networks: the task seems like something more on the Natural Language Processing side, and the former solutions have been extensively employed in that field; nevertheless DAG-Net, especially in its generative part (Recurrent VAE), doesn't suit your purpose.

Alex

ghost commented 4 years ago

The task might seem unrelated at first, but if you think about it, the task is basically tell which dialog box is next in the order. Using trajectory prediction allows drawing a line from the first till the last dialog, basically training the task as a trajectory prediction.

do you recommend any specific implementation of the networks that you suggested. Note that this is NOT a Natural Language Processing, it's predicting the structure or order of things.

alexmonti19 commented 4 years ago

Note that this is NOT a Natural Language Processing, it's predicting the structure or order of things.

Uh sorry, I gave for granted you were talking about giving the network also the boxes contents and extracting their order from this information. That's why I brought up NLP :)

The task might seem unrelated at first, but if you think about it, the task is basically tell which dialog box is next in the order. Using trajectory prediction allows drawing a line from the first till the last dialog, basically training the task as a trajectory prediction.

If you limit to the locations and xy coordinates of the boxes inside the page, the network could maybe come up with some results: it's all about seeing if the boxes locations are well characterized by a given distribution, it's hard to tell a priori. Without any particular experience, I would say that if the network succeeded in extracting some meaningful info, it would cope only with naive distributions / orders (left to right, up to bottom, as we would normally read the boxes across subsequent panels) and struggle with more complex layouts.

alexmonti19 / dagnet

Predict The Sequence of Dialog in Comic Books #1