Shared-Reality-Lab / IMAGE-server

IMAGE project server components
Other
2 stars 7 forks source link

collage detector: decide whether to move to production (with list of incorrect detections) #826

Closed jeffbl closed 1 month ago

jeffbl commented 4 months ago

collage detector is in testing on unicorn, and if the preprocessor is running, it is expressed in the photo-audio handler telling the user that the photo looks like a collage, so results might be weird. I've seen no issues with performance or code errors, but it is of course imperfect in its detection, including with false positives where it says a photo is a collage when it is not. This issue is to track these errors so we can decide if it is reliable enough to deploy on pegasus.

False positives: https://image.a11y.mcgill.ca/pages/imgs/pexels-mikhail-nilov-6981101.jpg (from the IMAGE homepage!) Bookshelves Museum Building with vertical pillars

False negatives: Marilyn face collage [@jeffbl opinion: this could reasonably by construed as not a collage, so I don't think this is a clear false negative.]

jeffbl commented 2 months ago

This issue is specific to the ML aspect of the collage-detector. Reassign to Yifan.

AndyBaiMQC commented 2 months ago

Some VERY interesting findings playing with prompt engineering on LLaVa Image

My strategy is to ask it to classify using the labels defined in content-categoriser, and then, indepedently, ask if it thinks it's collage. Now the first question usually has sensible answers, and the second question eliminates all FPs. It's verbal explanation is somewhat understandable, but I stand with Jeff that this is NOT a collage.

Basically, LLaVa is able to segment and pool away from images (rather than detecting continuous borders alone, as I imagined from CNN feature extraction), but when a dense, closed border images clustered together, it failed.

AndyBaiMQC commented 2 months ago

@jeffbl This could have some ramifications with textbook diagrams as the scenario of having lumps of 'sub-images' with clearly defined borders may appear. I'm continuing on prompt engineering and looking for ways to fine-tune if possible (or have a RAG style setup)

AndyBaiMQC commented 2 months ago

Image

jeffbl commented 1 month ago

@JRegimbal please tag for production

JRegimbal commented 1 month ago

tagged and updated