Closed jeffbl closed 1 month ago
This issue is specific to the ML aspect of the collage-detector. Reassign to Yifan.
Some VERY interesting findings playing with prompt engineering on LLaVa
My strategy is to ask it to classify using the labels defined in content-categoriser, and then, indepedently, ask if it thinks it's collage. Now the first question usually has sensible answers, and the second question eliminates all FPs. It's verbal explanation is somewhat understandable, but I stand with Jeff that this is NOT a collage.
Basically, LLaVa is able to segment and pool away from images (rather than detecting continuous borders alone, as I imagined from CNN feature extraction), but when a dense, closed border images clustered together, it failed.
@jeffbl This could have some ramifications with textbook diagrams as the scenario of having lumps of 'sub-images' with clearly defined borders may appear. I'm continuing on prompt engineering and looking for ways to fine-tune if possible (or have a RAG style setup)
@JRegimbal please tag for production
tagged and updated
collage detector is in testing on unicorn, and if the preprocessor is running, it is expressed in the photo-audio handler telling the user that the photo looks like a collage, so results might be weird. I've seen no issues with performance or code errors, but it is of course imperfect in its detection, including with false positives where it says a photo is a collage when it is not. This issue is to track these errors so we can decide if it is reliable enough to deploy on pegasus.
False positives: https://image.a11y.mcgill.ca/pages/imgs/pexels-mikhail-nilov-6981101.jpg (from the IMAGE homepage!) Bookshelves Museum Building with vertical pillars
False negatives: Marilyn face collage [@jeffbl opinion: this could reasonably by construed as not a collage, so I don't think this is a clear false negative.]