Shared-Reality-Lab / IMAGE-server

IMAGE project server components
Other
2 stars 7 forks source link

Collage preprocessor does not properly check for non-graphic inputs, likely comes too early in priority grouping #625

Open JRegimbal opened 1 year ago

JRegimbal commented 1 year ago

The preprocessor assumes that a request will always have a raster graphic embedded in the graphic key of the request. This is not true. It also is positioned to run in priority group 1, which may be too early unless this is truly a preprocessor intended for all collages of graphics (e.g., collages of charts, diagrams).

jeffbl commented 1 year ago

@rianadutta Moving to next sprint, but if not realistic to complete, let's move to backlog?

jeffbl commented 11 months ago

Moving to backlog as this does not currently impact user experience. It should be fixed for non-photo collages just because this would set expectations appropriately for this preprocessor, but I'm not sure what it currently does with non-photo collages. Will need some testing before deciding.

jeffbl commented 4 months ago

@AndyBaiMQC The debate here is from several issues, which make it unclear what we should do overall. Two options, maybe there are others:

  1. Move collage-detector to run group 2, so that it can look at the output of content-categorizer to see if the graphic is a photo, and only run if it is, since it likely will not do anything useful on other graphic types (e.g., chart). However, it appears that content-categorizer is putting actual photo collages into "chart/other" category (#829) for collages like this one, so the collage-detector doesn't even try. This is obviously not tenable unless the content-categorizer is improved to recognize graphic types more reliably.
  2. Look at the entire front end of the ML pipeline, across content-categorizer, collage-detector, graphic-tagger, and modify/replace them as a group, perhaps adding more categories that would be of use, since the current set/breakdown seems pretty arbitrary and random, with many mis-identifications.

Open to other possibilities. @AndyBaiMQC let me know if need to discuss, but this tangle seems like a great way to get up-to-speed on the preprocessor pipeline.

AndyBaiMQC commented 3 months ago

2 aspects of investigation: model + category definitions. For both collage detector and content categorizer, here are some updates from local tests:

  1. Content detector:

    • Manually adding dummy categories (more classes), some difference in classification results of the 4 existing classes but cannot explain. Investigating predicted probabilities for new one as well as existing, but since multi class relies on argmax, this is only for understanding
    • Changing learning rate (lr) to smaller values (1e-4, 5e-5) doesn't seem to affect the performance as much, albeit slightly slower but more stable traning which are expected. This is a likely sign that this small model is saturated and satisfactory as-is. Happy to keep as a baseline eval tool.
  2. Collage detector:

    • The model relies heavily on pixel-based processing.
    • Should move towards an object detection model that are capable of detecting long yet uninterrupted edges. This is doable with current pixel-based model, and we should in parallel try llava and other NN based solutions.
    • Tweaking parameters with existing model.