Shared-Reality-Lab / IMAGE-server

IMAGE project server components
Other
2 stars 7 forks source link

Hierarchical information for the photo handler with segment output #310

Open JRegimbal opened 2 years ago

JRegimbal commented 2 years ago

The photo handler currently produces segments with specificity ranging from the whole rendering to individual semantic segments or objects. However these are currently just presented in order in a large list. Ideally we would have

1) intermediate segments containing each data type (e.g., a segment containing the semantic segment renderings); and 2) an interface capable of displaying hierarchical information for these segments, e.g., the full rendering contains the semantic segmentation part which contains the sky segment.

This would ideally aid in navigating between various parts, especially if more kinds of data (i.e., OCR) become bundled in the rendering as well. This will likely also require changes to the extension to make the segment audio renderer able to detect which audio segments are nested (based on time offset and duration) and to then display them in a sensible way.

Cybernide commented 2 years ago

By "segments" you mean "sections of the rendering" unless its preceded by the word "semantic", yes? For the sake of clarity, should we settle the terminology? I am all for this hierarchy of information, by the way.

As of now, should we assume that the "types" of information we get back are either

  1. semantic segments
  2. objects
  3. OCR Anything else?
JRegimbal commented 2 years ago

Yes, "segments" on its own is used to refer to relevant parts of the audio file and any associated metadata. All for a brief, unambiguous name myself (although I do personally prefer referring to semantic segments as "regions" :wink:). At this point, the types of information are those extracted for photographs, which would be semantic segments/regions, objects/small regions, and text. That should be fine for creating links between data in the handler, but I do strongly think that whatever output format/renderer we make should be agnostic to whatever types of information is returned.