IVRL / AI4VA

This is the GitHub repository for AI for Visual Arts Workshop and Challenges (AI4VA) in conjunction with ECCV 2024, Milano, Italy.
5 stars 0 forks source link

Meaning of "Interdepth" and "Intradepth" #4

Closed atharvpawar closed 1 month ago

atharvpawar commented 1 month ago

Hi,

For the depth challenge, I was wondering what is the meaning of "interdepth" and "intradepth" in the dataset annotations. Looking at the examples, I inferred that "intradepth" means the relative depth of different segments of the image, with the brighter segments being closer to the viewer. Following is the example image and the corresponding intradepth image:

Example image (depth/data/images/val/Vaillant_0456_1954_02_07-01.png): Vaillant_0456_1954_02_07-01

Intradepth image: Intradepth

However the "interdepth" mask did not make sense to me. It appears similar to the intradepth mask, except some foreground objects/characters are not visible/made part of the background.

Interdepth image: Interdepth

What are the meanings of these two annotations?

deblinaml commented 1 month ago

Thank you fro your question. Before I explain the two aspects of ordinal depth as asked, please allow me to begin with some basic definitions which you might already know. Ordinal depth involves a qualitative ranking of objects by their distance from the observer, categorising them as either "closer" or "further away" without quantifying the exact distances. This contrasts with metric depth, which quantifies the exact distances between the observer and objects within a scene, typically in units such as metres or centimetres. Human perception is generally poor at accurately estimating metric depth or the three-dimensional metric structure from a single viewpoint. This is because metric depth in a single image is inherently ambiguous; for example, a tree positioned behind a house might appear larger yet be further away, or smaller and closer, making it impossible to determine the absolute depth difference between the two objects uniquely. Moreover, even when humans can gauge metric depth to some extent, extracting precise numerical values from such estimations remains problematic.

Given these challenges, humans are more adept at assessing relative depth, finding it easier to answer questions like "Is point A closer than point B?". Based on this premise, we opt to manually annotate the AI4VA dataset for ordinal depth, recognising that accurate metric depth annotations are particularly difficult to obtain in monocular settings, such as with comic images. Specifically, we use ordinal and integer-based inter- and _intra-_depth planes. The inter-depth planes, denoted as $[0,\infty)$, are utilised to represent broader distance intervals between distinct ordinal levels. To then capture subtler variations within a given inter-depth plane, we utilise intra-depth planes, defined within the range $[1,9]$. This dual-plane system facilitates the delineation of occlusion relationships and the sequential arrangement of each segmented component within a comic panel, including detailed elements like faces and hands. Hope this helps!

deblinaml commented 1 month ago

The dataset employs an ordinal depth scale, predominantly positioning objects within the zero planes of both inter-depth and intra-depth. Each panel invariably includes at least one object in these planes, serving as a reference point for determining the relative depth of subsequent objects. This arrangement goes on iteratively for each depth-plane beyond the first, resulting in a geometric distribution pattern for depth across the dataset, with distinct spikes, representing the inter-depth progression. Please note that there is something interesting about the AI4VA inter-depth. Your observation is along the right directions and explains an inherent bias of deep learning models as well. We cannot disclose further information about this until after the competition is over. Cheers!

deblinaml commented 1 month ago

Please do not hesitate to re-open this issue should you have further questions. There are some details in one of our works: https://openaccess.thecvf.com/content/WACV2022/papers/Bhattacharjee_Estimating_Image_Depth_in_the_Comics_Domain_WACV_2022_paper.pdf. Cheers!