Closed jeffbl closed 1 year ago
https://images.fineartamerica.com/images/artworkimages/mediumlarge/3/old-montreal-cobblestone-street-scene-john-rizzuto.jpg "This picture of a parking garage contains 3 people 1 traffic light and 1 potted plant." This is not a parking garage, this is a photo of a street in Old Montreal
That first part of that is the scene recognizer, which is known bad, and is being removed from the build immediately. So I think we should focus on just the part beyond the "this picture of a SOMETHING" section, which is typically object detection.
Got it. Will ignore scene recognizer
https://en.wikipedia.org/wiki/Cooking#/media/File:Agdz-rosino-05.jpg This picture of an attic contains 2 oranges 5 bowls 1 dining table and 1 person. EXPECT: No oranges, but finds more kitchen items, e.g., skewers, eggs, kettle, food cans...
after #209 20211213: This picture contains 5 bowls 3 cups 2 oranges 1 spoon 1 dining table and 1 person.
after testing on a bigger model(yolov5s6): 4 bowls, 3 cups, 2 oranges, a spoon, a dining table, and a person.
csail-semantic segmentation output(current model): wall, floor, ceiling, person, table mseg:'ceiling', 'food_other', 'table', 'counter_other', 'bottle', 'cup', 'bowl', 'person', 'paper', 'wall'
output of mseg
https://creativecommons.org/2016/08/02/open_building_institute/kitchen/ This picture of an alcove contains 6 apples 2 chairs 1 orange 1 book and 1 bottle. Expect: No oranges (what is it with oranges, when there is nothing the color orange in the pictures?), no apples, yes eggs, there are like 100 books, not just one. Bottle is so-so. Does not find tables.
after #209 20211213: This indoor picture contains 2 dining tables 5 books 3 chairs and 1 bottle.
after testing on a bigger model(yolov5s6): 7 books, 2 dining tables, 3 chairs, a couch, and a bottle.
csail-semantic segmentation output(current model): wall, floor, table, book, shelf mseg: 'ceiling', 'floor', 'apple', 'chair_other', 'armchair', 'couch', 'table', 'basket', 'bookshelf', 'door', 'shelf', 'book', 'bottle', 'cup', 'bowl', 'person', 'sports_ball', 'pillow', 'wall'
https://unsplash.com/photos/bT3dHRFAREA This picture of a plaza contains 8 people and 1 traffic light. EXPECT: Flags, people count is off (maybe we should say "about" or since it seems to undercount, "at least"?)
after #209 20211213: This picture contains 3 handbags 11 people and 2 traffic lights. after testing on a bigger model(yolov5s6): 11 people, 2 traffic lights, and a handbag.
csail-semantic segmentation output(current model): person, floor, building, sky, tree mseg: 'building', 'road', 'person', 'sky', 'vegetation'
output for mseg
https://creativecommons.org/2016/08/02/open_building_institute/modular-house/ This picture of a campsite contains 8 people and 1 surfboard. EXPECT: Construction site, board (not a surfboard). Ladder? Hardhats? Bricks?
after #209 20211213: This outdoor picture contains 7 people.
after testing on a bigger model(yolov5s6): 7 people
csail-semantic segmentation output(current model): wall, person, floor, sky ,tree mseg: building,floor,chair_other,armchair, table, stairs, sidewalk_pavement, terrain, person, bench, sky, baseball_bat, skateboard, vegetation, airplane, wall
output for mseg
https://api.creativecommons.engineering/v1/thumbs/aad5765c-a8a0-4d16-8dfa-d9f8c3e69155 This picture of a restaurant kitchen contains 4 chairs 2 refrigerators 2 ovens 2 microwaves 1 vase 1 bottle 1 remote 1 clock and 1 dining table. EXPECT: Finds multiples of appliances (maybe toaster oven and dishwasher are confusing?) I only see (and hear in the rendering) three chairs, not 4. I don't see any refrigerator. Others very good: remote, table, clock.
after testing on a bigger model(yolov5s6): the link is no longer valid
https://search.creativecommons.org/photos/ca32bd72-5e2b-4949-b641-8a9a13a1b6ad This picture of a restaurant kitchen contains 2 ovens 1 sink and 1 refrigerator. EXPECT: one oven, one dishwasher (yes, dishwashers look like ovens...) Would be nice to hear about cabinets or handles.
after #209 20211213: This indoor picture contains 2 ovens 1 sink and 1 refrigerator.
csail-semantic segmentation output(current model): request not getting sent to server
after #209 : 4 people, 2 cars, 2 trucks, a bicycle, and a traffic light.
after testing on a bigger model(yolov5s6): 4 people, 2 cars, 2 trucks, a bicycle, and a traffic light.
csail-semantic segmentation output(current model): building, road mseg: 'building', 'road', 'sidewalk_pavement', 'person', 'sky', 'car', 'truck'
output for mseg
I am closing this as the last updates are from a year ago, and are likely no longer relevant.
When #194 is resolved, there will be a more formal way to log issues with preprocessors giving weird results, e.g., consistently wrong tags like "embassy". For now, please use this work item to report weird preprocessor results, with one graphic per comment. Please make sure to include: