DavidMChan / aloha

A new reliable, localizable, and generalizable metric for hallucination detection in image captioning models.
5 stars 0 forks source link

cannot reproduce Table 1 CLIPScore #4

Closed long8v closed 3 months ago

long8v commented 4 months ago

I try to reproduce table 1 CLIPScore with HAT dataset. I run with original CLIPScore repo, and it returns this. (slightly change to return foil at the same time)

[{"image_id": "COCO_val2014_000000016903", "CLIPScore": 0.84375, "foil": true}, {"image_id": "COCO_val2014_000000023709", "CLIPScore": 0.70068359375, "foil": false}, {"image_id": "COCO_val2014_000000553561", "CLIPScore": 0.900390625, "foil": false}, {"image_id": "COCO_val2014_000000090367", "CLIPScore": 0.8642578125, "foil": false}, {"image_id": "COCO_val2014_000000539226", "CLIPScore": 0.70703125, "foil": false}, {"image_id": "COCO_val2014_000000122838", "CLIPScore": 0.72802734375, "foil": false}, {"image_id": "COCO_val2014_000000450577", "CLIPScore": 0.77880859375, "foil": false}, {"image_id": "COCO_val2014_000000196660", "CLIPScore": 0.77880859375, "foil": false}, {"image_id": "COCO_val2014_000000089541", "CLIPScore": 0.931640625, "foil": false}, {"image_id": "COCO_val2014_000000228013", "CLIPScore": 0.81298828125, "foil": false}, {"image_id": "COCO_val2014_000000226579", "CLIPScore": 0.849609375, "foil": false}, {"image_id": "COCO_val2014_000000464689", "CLIPScore": 0.7724609375, "foil": true}, {"image_id": "COCO_val2014_000000536292", "CLIPScore": 0.83544921875, "foil": false}, {"image_id": "COCO_val2014_000000331799", "CLIPScore": 0.69873046875, "foil": true}, {"image_id": "COCO_val2014_000000266491", "CLIPScore": 0.734375, "foil": true}, {"image_id": "COCO_val2014_000000570594", "CLIPScore": 0.646484375, "foil": false}, {"image_id": "COCO_val2014_000000481710", "CLIPScore": 0.7783203125, "foil": false}, {"image_id": "COCO_val2014_000000461953", "CLIPScore": 0.8330078125, "foil": false}, {"image_id": "COCO_val2014_000000206751", "CLIPScore": 0.76220703125, "foil": true}, {"image_id": "COCO_val2014_000000218205", "CLIPScore": 0.69921875, "foil": false}, {"image_id": "COCO_val2014_000000016161", "CLIPScore": 0.8662109375, "foil": false}, {"image_id": "COCO_val2014_000000134103", "CLIPScore": 0.78564453125, "foil": false}, {"image_id": "COCO_val2014_000000103870", "CLIPScore": 0.8623046875, "foil": true}, {"image_id": "COCO_val2014_000000491154", "CLIPScore": 0.8876953125, "foil": false}, {"image_id": "COCO_val2014_000000538721", "CLIPScore": 0.69580078125, "foil": true}, {"image_id": "COCO_val2014_000000234676", "CLIPScore": 0.6826171875, "foil": false}, {"image_id": "COCO_val2014_000000382512", "CLIPScore": 0.79638671875, "foil": true}, {"image_id": "COCO_val2014_000000006701", "CLIPScore": 0.763671875, "foil": false}, {"image_id": "COCO_val2014_000000333190", "CLIPScore": 0.75244140625, "foil": true}, {"image_id": "COCO_val2014_000000050753", "CLIPScore": 0.796875, "foil": false}, {"image_id": "COCO_val2014_000000345469", "CLIPScore": 0.9052734375, "foil": false}, {"image_id": "COCO_val2014_000000489023", "CLIPScore": 0.6728515625, "foil": false}, {"image_id": "COCO_val2014_000000221725", "CLIPScore": 0.78564453125, "foil": false}, {"image_id": "COCO_val2014_000000535997", "CLIPScore": 0.693359375, "foil": false}, {"image_id": "COCO_val2014_000000367429", "CLIPScore": 0.85400390625, "foil": false}, {"image_id": "COCO_val2014_000000411587", "CLIPScore": 0.84765625, "foil": false}, {"image_id": "COCO_val2014_000000578703", "CLIPScore": 0.73681640625, "foil": true}, {"image_id": "COCO_val2014_000000101280", "CLIPScore": 0.78564453125, "foil": true}, {"image_id": "COCO_val2014_000000577310", "CLIPScore": 0.81640625, "foil": false}, {"image_id": "COCO_val2014_000000167656", "CLIPScore": 0.66162109375, "foil": false}, {"image_id": "COCO_val2014_000000209835", "CLIPScore": 0.7578125, "foil": false}, {"image_id": "COCO_val2014_000000261116", "CLIPScore": 0.87890625, "foil": true}, {"image_id": "COCO_val2014_000000224037", "CLIPScore": 0.6982421875, "foil": false}, {"image_id": "COCO_val2014_000000183407", "CLIPScore": 0.79150390625, "foil": false}, {"image_id": "COCO_val2014_000000347675", "CLIPScore": 0.75927734375, "foil": true}, {"image_id": "COCO_val2014_000000280918", "CLIPScore": 0.90771484375, "foil": false}, {"image_id": "COCO_val2014_000000083113", "CLIPScore": 0.8271484375, "foil": false}, {"image_id": "COCO_val2014_000000010432", "CLIPScore": 0.8515625, "foil": true}, {"image_id": "COCO_val2014_000000173574", "CLIPScore": 0.68408203125, "foil": true}, {"image_id": "COCO_val2014_000000561214", "CLIPScore": 0.74755859375, "foil": true}, {"image_id": "COCO_val2014_000000227901", "CLIPScore": 0.6904296875, "foil": true}, {"image_id": "COCO_val2014_000000227960", "CLIPScore": 0.81494140625, "foil": false}, {"image_id": "COCO_val2014_000000466960", "CLIPScore": 0.6591796875, "foil": false}, {"image_id": "COCO_val2014_000000245852", "CLIPScore": 0.794921875, "foil": false}, {"image_id": "COCO_val2014_000000129592", "CLIPScore": 0.87939453125, "foil": false}, {"image_id": "COCO_val2014_000000555648", "CLIPScore": 0.763671875, "foil": false}, {"image_id": "COCO_val2014_000000229599", "CLIPScore": 0.849609375, "foil": false}, {"image_id": "COCO_val2014_000000082465", "CLIPScore": 0.78076171875, "foil": true}, {"image_id": "COCO_val2014_000000249672", "CLIPScore": 0.7880859375, "foil": false}, {"image_id": "COCO_val2014_000000441211", "CLIPScore": 0.740234375, "foil": false}, {"image_id": "COCO_val2014_000000481670", "CLIPScore": 0.7587890625, "foil": false}, {"image_id": "COCO_val2014_000000304741", "CLIPScore": 0.935546875, "foil": true}, {"image_id": "COCO_val2014_000000534045", "CLIPScore": 0.8525390625, "foil": true}, {"image_id": "COCO_val2014_000000514586", "CLIPScore": 0.85009765625, "foil": true}, {"image_id": "COCO_val2014_000000523252", "CLIPScore": 0.73193359375, "foil": true}, {"image_id": "COCO_val2014_000000201301", "CLIPScore": 0.89404296875, "foil": true}, {"image_id": "COCO_val2014_000000191981", "CLIPScore": 0.716796875, "foil": false}, {"image_id": "COCO_val2014_000000179317", "CLIPScore": 0.89453125, "foil": true}, {"image_id": "COCO_val2014_000000492800", "CLIPScore": 0.6767578125, "foil": true}, {"image_id": "COCO_val2014_000000077595", "CLIPScore": 0.8056640625, "foil": true}, {"image_id": "COCO_val2014_000000196594", "CLIPScore": 0.68603515625, "foil": true}, {"image_id": "COCO_val2014_000000000139", "CLIPScore": 0.72314453125, "foil": false}, {"image_id": "COCO_val2014_000000377832", "CLIPScore": 0.83056640625, "foil": true}, {"image_id": "COCO_val2014_000000018737", "CLIPScore": 0.83544921875, "foil": false}, {"image_id": "COCO_val2014_000000212470", "CLIPScore": 0.896484375, "foil": true}, {"image_id": "COCO_val2014_000000356261", "CLIPScore": 0.88818359375, "foil": false}, {"image_id": "COCO_val2014_000000128570", "CLIPScore": 0.93017578125, "foil": false}, {"image_id": "COCO_val2014_000000007320", "CLIPScore": 0.7841796875, "foil": false}, {"image_id": "COCO_val2014_000000392928", "CLIPScore": 0.8056640625, "foil": false}, {"image_id": "COCO_val2014_000000066046", "CLIPScore": 0.939453125, "foil": false}, {"image_id": "COCO_val2014_000000253282", "CLIPScore": 0.8125, "foil": false}, {"image_id": "COCO_val2014_000000296303", "CLIPScore": 0.73291015625, "foil": false}, {"image_id": "COCO_val2014_000000574592", "CLIPScore": 0.88134765625, "foil": false}, {"image_id": "COCO_val2014_000000273825", "CLIPScore": 0.77197265625, "foil": false}, {"image_id": "COCO_val2014_000000027805", "CLIPScore": 0.8603515625, "foil": false}, {"image_id": "COCO_val2014_000000236272", "CLIPScore": 0.6689453125, "foil": true}, {"image_id": "COCO_val2014_000000433998", "CLIPScore": 0.85693359375, "foil": false}, {"image_id": "COCO_val2014_000000497141", "CLIPScore": 0.9013671875, "foil": false}, {"image_id": "COCO_val2014_000000518188", "CLIPScore": 0.7177734375, "foil": true}, {"image_id": "COCO_val2014_000000514979", "CLIPScore": 0.78662109375, "foil": false}, {"image_id": "COCO_val2014_000000319687", "CLIPScore": 0.75927734375, "foil": false}, {"image_id": "COCO_val2014_000000261758", "CLIPScore": 0.80517578125, "foil": false}, {"image_id": "COCO_val2014_000000336568", "CLIPScore": 0.8095703125, "foil": false}, {"image_id": "COCO_val2014_000000028864", "CLIPScore": 0.88818359375, "foil": false}, {"image_id": "COCO_val2014_000000566049", "CLIPScore": 0.8037109375, "foil": false}, {"image_id": "COCO_val2014_000000117676", "CLIPScore": 0.72314453125, "foil": false}, {"image_id": "COCO_val2014_000000128813", "CLIPScore": 0.875, "foil": false}, {"image_id": "COCO_val2014_000000190432", "CLIPScore": 0.84765625, "foil": false}, {"image_id": "COCO_val2014_000000101660", "CLIPScore": 0.79150390625, "foil": true}, {"image_id": "COCO_val2014_000000463785", "CLIPScore": 0.7646484375, "foil": false}, {"image_id": "COCO_val2014_000000410141", "CLIPScore": 0.6767578125, "foil": false}, {"image_id": "COCO_val2014_000000237041", "CLIPScore": 0.78857421875, "foil": false}, {"image_id": "COCO_val2014_000000443347", "CLIPScore": 0.8173828125, "foil": false}, {"image_id": "COCO_val2014_000000276720", "CLIPScore": 0.669921875, "foil": false}, {"image_id": "COCO_val2014_000000028850", "CLIPScore": 0.70556640625, "foil": false}, {"image_id": "COCO_val2014_000000500940", "CLIPScore": 0.87841796875, "foil": false}, {"image_id": "COCO_val2014_000000314412", "CLIPScore": 0.8486328125, "foil": false}, {"image_id": "COCO_val2014_000000172201", "CLIPScore": 0.791015625, "foil": false}, {"image_id": "COCO_val2014_000000232598", "CLIPScore": 0.8349609375, "foil": false}, {"image_id": "COCO_val2014_000000113113", "CLIPScore": 0.75244140625, "foil": false}, {"image_id": "COCO_val2014_000000483401", "CLIPScore": 0.86376953125, "foil": false}, {"image_id": "COCO_val2014_000000032258", "CLIPScore": 0.75244140625, "foil": false}, {"image_id": "COCO_val2014_000000158887", "CLIPScore": 0.8828125, "foil": false}, {"image_id": "COCO_val2014_000000258523", "CLIPScore": 0.7861328125, "foil": false}, {"image_id": "COCO_val2014_000000439770", "CLIPScore": 0.7978515625, "foil": true}, {"image_id": "COCO_val2014_000000217301", "CLIPScore": 0.88623046875, "foil": true}, {"image_id": "COCO_val2014_000000192905", "CLIPScore": 0.84716796875, "foil": true}, {"image_id": "COCO_val2014_000000363577", "CLIPScore": 0.86865234375, "foil": true}, {"image_id": "COCO_val2014_000000149568", "CLIPScore": 0.7109375, "foil": true}, {"image_id": "COCO_val2014_000000127660", "CLIPScore": 0.87451171875, "foil": false}, {"image_id": "COCO_val2014_000000299493", "CLIPScore": 0.7919921875, "foil": false}, {"image_id": "COCO_val2014_000000293757", "CLIPScore": 0.8212890625, "foil": true}, {"image_id": "COCO_val2014_000000386912", "CLIPScore": 0.7353515625, "foil": false}, {"image_id": "COCO_val2014_000000451084", "CLIPScore": 0.81103515625, "foil": false}, {"image_id": "COCO_val2014_000000376545", "CLIPScore": 0.7685546875, "foil": false}, {"image_id": "COCO_val2014_000000327401", "CLIPScore": 0.82958984375, "foil": true}, {"image_id": "COCO_val2014_000000562614", "CLIPScore": 0.83447265625, "foil": false}, {"image_id": "COCO_val2014_000000366264", "CLIPScore": 0.67919921875, "foil": false}, {"image_id": "COCO_val2014_000000036450", "CLIPScore": 0.78173828125, "foil": false}, {"image_id": "COCO_val2014_000000202825", "CLIPScore": 0.89306640625, "foil": true}, {"image_id": "COCO_val2014_000000308506", "CLIPScore": 0.72998046875, "foil": false}, {"image_id": "COCO_val2014_000000511469", "CLIPScore": 0.88916015625, "foil": false}, {"image_id": "COCO_val2014_000000264191", "CLIPScore": 0.7529296875, "foil": false}, {"image_id": "COCO_val2014_000000528276", "CLIPScore": 0.87939453125, "foil": true}, {"image_id": "COCO_val2014_000000375415", "CLIPScore": 0.8994140625, "foil": true}, {"image_id": "COCO_val2014_000000095677", "CLIPScore": 0.95751953125, "foil": false}, {"image_id": "COCO_val2014_000000043997", "CLIPScore": 0.88916015625, "foil": false}, {"image_id": "COCO_val2014_000000102577", "CLIPScore": 0.77880859375, "foil": false}, {"image_id": "COCO_val2014_000000139917", "CLIPScore": 0.96875, "foil": false}, {"image_id": "COCO_val2014_000000207561", "CLIPScore": 0.833984375, "foil": false}, {"image_id": "COCO_val2014_000000169331", "CLIPScore": 0.77099609375, "foil": true}, {"image_id": "COCO_val2014_000000375415", "CLIPScore": 0.85400390625, "foil": false}, {"image_id": "COCO_val2014_000000538463", "CLIPScore": 0.744140625, "foil": false}, {"image_id": "COCO_val2014_000000323925", "CLIPScore": 0.76611328125, "foil": false}, {"image_id": "COCO_val2014_000000091615", "CLIPScore": 0.8076171875, "foil": false}, {"image_id": "COCO_val2014_000000543692", "CLIPScore": 0.759765625, "foil": false}, {"image_id": "COCO_val2014_000000362023", "CLIPScore": 0.6796875, "foil": true}, {"image_id": "COCO_val2014_000000331250", "CLIPScore": 0.73828125, "foil": true}, {"image_id": "COCO_val2014_000000528786", "CLIPScore": 0.78173828125, "foil": false}, {"image_id": "COCO_val2014_000000134596", "CLIPScore": 0.638671875, "foil": true}, {"image_id": "COCO_val2014_000000455741", "CLIPScore": 0.72509765625, "foil": true}, {"image_id": "COCO_val2014_000000431573", "CLIPScore": 0.9462890625, "foil": false}, {"image_id": "COCO_val2014_000000552901", "CLIPScore": 0.81494140625, "foil": false}, {"image_id": "COCO_val2014_000000050165", "CLIPScore": 0.80029296875, "foil": true}, {"image_id": "COCO_val2014_000000473299", "CLIPScore": 0.7841796875, "foil": true}, {"image_id": "COCO_val2014_000000245145", "CLIPScore": 0.74267578125, "foil": true}, {"image_id": "COCO_val2014_000000004840", "CLIPScore": 0.77197265625, "foil": false}, {"image_id": "COCO_val2014_000000125208", "CLIPScore": 0.8642578125, "foil": false}, {"image_id": "COCO_val2014_000000515585", "CLIPScore": 0.81494140625, "foil": true}, {"image_id": "COCO_val2014_000000322056", "CLIPScore": 0.7763671875, "foil": false}, {"image_id": "COCO_val2014_000000557172", "CLIPScore": 0.8056640625, "foil": false}, {"image_id": "COCO_val2014_000000169226", "CLIPScore": 0.7763671875, "foil": false}, {"image_id": "COCO_val2014_000000290416", "CLIPScore": 0.78369140625, "foil": false}, {"image_id": "COCO_val2014_000000551633", "CLIPScore": 0.7587890625, "foil": true}, {"image_id": "COCO_val2014_000000311789", "CLIPScore": 0.79345703125, "foil": true}, {"image_id": "COCO_val2014_000000208135", "CLIPScore": 0.7734375, "foil": false}, {"image_id": "COCO_val2014_000000137954", "CLIPScore": 0.75439453125, "foil": false}, {"image_id": "COCO_val2014_000000091267", "CLIPScore": 0.80078125, "foil": false}, {"image_id": "COCO_val2014_000000304741", "CLIPScore": 0.7568359375, "foil": false}, {"image_id": "COCO_val2014_000000460841", "CLIPScore": 0.8203125, "foil": true}, {"image_id": "COCO_val2014_000000291028", "CLIPScore": 0.8037109375, "foil": false}, {"image_id": "COCO_val2014_000000439290", "CLIPScore": 0.7724609375, "foil": false}, {"image_id": "COCO_val2014_000000441468", "CLIPScore": 0.7880859375, "foil": false}, {"image_id": "COCO_val2014_000000543570", "CLIPScore": 0.82421875, "foil": false}, {"image_id": "COCO_val2014_000000472472", "CLIPScore": 0.79296875, "foil": false}, {"image_id": "COCO_val2014_000000094379", "CLIPScore": 0.8466796875, "foil": true}, {"image_id": "COCO_val2014_000000381519", "CLIPScore": 0.8349609375, "foil": false}, {"image_id": "COCO_val2014_000000325958", "CLIPScore": 0.75146484375, "foil": true}, {"image_id": "COCO_val2014_000000109216", "CLIPScore": 0.830078125, "foil": true}, {"image_id": "COCO_val2014_000000199510", "CLIPScore": 0.8037109375, "foil": true}, {"image_id": "COCO_val2014_000000434829", "CLIPScore": 0.6689453125, "foil": false}, {"image_id": "COCO_val2014_000000066263", "CLIPScore": 0.86962890625, "foil": false}, {"image_id": "COCO_val2014_000000190760", "CLIPScore": 0.9462890625, "foil": false}, {"image_id": "COCO_val2014_000000229216", "CLIPScore": 0.8271484375, "foil": true}, {"image_id": "COCO_val2014_000000429598", "CLIPScore": 0.8115234375, "foil": true}, {"image_id": "COCO_val2014_000000021232", "CLIPScore": 0.8056640625, "foil": false}, {"image_id": "COCO_val2014_000000130599", "CLIPScore": 0.70263671875, "foil": true}, {"image_id": "COCO_val2014_000000065306", "CLIPScore": 0.880859375, "foil": false}, {"image_id": "COCO_val2014_000000547487", "CLIPScore": 0.84033203125, "foil": false}, {"image_id": "COCO_val2014_000000358149", "CLIPScore": 0.6396484375, "foil": false}, {"image_id": "COCO_val2014_000000017959", "CLIPScore": 0.9111328125, "foil": true}, {"image_id": "COCO_val2014_000000310902", "CLIPScore": 0.7841796875, "foil": false}, {"image_id": "COCO_val2014_000000160004", "CLIPScore": 0.84912109375, "foil": true}, {"image_id": "COCO_val2014_000000538064", "CLIPScore": 0.8095703125, "foil": false}, {"image_id": "COCO_val2014_000000125997", "CLIPScore": 1.0263671875, "foil": false}, {"image_id": "COCO_val2014_000000002255", "CLIPScore": 0.87451171875, "foil": true}, {"image_id": "COCO_val2014_000000153734", "CLIPScore": 0.80517578125, "foil": false}, {"image_id": "COCO_val2014_000000371243", "CLIPScore": 0.89599609375, "foil": false}, {"image_id": "COCO_val2014_000000544237", "CLIPScore": 0.7099609375, "foil": false}, {"image_id": "COCO_val2014_000000002495", "CLIPScore": 0.8349609375, "foil": false}, {"image_id": "COCO_val2014_000000498381", "CLIPScore": 0.890625, "foil": false}, {"image_id": "COCO_val2014_000000541550", "CLIPScore": 0.8505859375, "foil": true}, {"image_id": "COCO_val2014_000000303926", "CLIPScore": 0.8212890625, "foil": true}, {"image_id": "COCO_val2014_000000115776", "CLIPScore": 0.7080078125, "foil": false}, {"image_id": "COCO_val2014_000000388927", "CLIPScore": 0.61865234375, "foil": false}, {"image_id": "COCO_val2014_000000299987", "CLIPScore": 0.94140625, "foil": false}, {"image_id": "COCO_val2014_000000058225", "CLIPScore": 0.7919921875, "foil": false}, {"image_id": "COCO_val2014_000000501494", "CLIPScore": 0.8359375, "foil": false}, {"image_id": "COCO_val2014_000000457453", "CLIPScore": 0.77587890625, "foil": false}, {"image_id": "COCO_val2014_000000114871", "CLIPScore": 0.85009765625, "foil": false}, {"image_id": "COCO_val2014_000000005728", "CLIPScore": 0.9560546875, "foil": true}, {"image_id": "COCO_val2014_000000579602", "CLIPScore": 0.80078125, "foil": true}, {"image_id": "COCO_val2014_000000322509", "CLIPScore": 0.87646484375, "foil": false}, {"image_id": "COCO_val2014_000000461573", "CLIPScore": 0.775390625, "foil": true}, {"image_id": "COCO_val2014_000000135155", "CLIPScore": 0.787109375, "foil": false}, {"image_id": "COCO_val2014_000000249658", "CLIPScore": 0.7548828125, "foil": false}, {"image_id": "COCO_val2014_000000004678", "CLIPScore": 0.7802734375, "foil": false}, {"image_id": "COCO_val2014_000000079331", "CLIPScore": 0.9189453125, "foil": false}, {"image_id": "COCO_val2014_000000255769", "CLIPScore": 0.7685546875, "foil": false}, {"image_id": "COCO_val2014_000000002495", "CLIPScore": 0.7880859375, "foil": true}, {"image_id": "COCO_val2014_000000342593", "CLIPScore": 0.943359375, "foil": false}, {"image_id": "COCO_val2014_000000257328", "CLIPScore": 0.8134765625, "foil": true}, {"image_id": "COCO_val2014_000000451275", "CLIPScore": 0.79541015625, "foil": false}, {"image_id": "COCO_val2014_000000110265", "CLIPScore": 0.734375, "foil": false}, {"image_id": "COCO_val2014_000000121014", "CLIPScore": 0.70556640625, "foil": true}, {"image_id": "COCO_val2014_000000386032", "CLIPScore": 0.8486328125, "foil": false}, {"image_id": "COCO_val2014_000000138639", "CLIPScore": 0.80517578125, "foil": true}, {"image_id": "COCO_val2014_000000380487", "CLIPScore": 0.72998046875, "foil": false}, {"image_id": "COCO_val2014_000000221571", "CLIPScore": 0.876953125, "foil": false}, {"image_id": "COCO_val2014_000000337984", "CLIPScore": 0.75244140625, "foil": false}, {"image_id": "COCO_val2014_000000012959", "CLIPScore": 0.89111328125, "foil": false}, {"image_id": "COCO_val2014_000000514979", "CLIPScore": 0.7724609375, "foil": false}, {"image_id": "COCO_val2014_000000199688", "CLIPScore": 0.7890625, "foil": false}, {"image_id": "COCO_val2014_000000575174", "CLIPScore": 0.78857421875, "foil": false}, {"image_id": "COCO_val2014_000000440528", "CLIPScore": 0.712890625, "foil": false}, {"image_id": "COCO_val2014_000000564355", "CLIPScore": 0.7646484375, "foil": true}, {"image_id": "COCO_val2014_000000351875", "CLIPScore": 0.69580078125, "foil": false}, {"image_id": "COCO_val2014_000000437049", "CLIPScore": 0.78369140625, "foil": false}, {"image_id": "COCO_val2014_000000543409", "CLIPScore": 0.74951171875, "foil": true}, {"image_id": "COCO_val2014_000000198163", "CLIPScore": 0.6669921875, "foil": true}, {"image_id": "COCO_val2014_000000158583", "CLIPScore": 0.61328125, "foil": true}, {"image_id": "COCO_val2014_000000124390", "CLIPScore": 0.80517578125, "foil": false}, {"image_id": "COCO_val2014_000000192192", "CLIPScore": 0.81298828125, "foil": false}, {"image_id": "COCO_val2014_000000155192", "CLIPScore": 0.77685546875, "foil": false}, {"image_id": "COCO_val2014_000000279386", "CLIPScore": 0.87841796875, "foil": false}, {"image_id": "COCO_val2014_000000407826", "CLIPScore": 0.80078125, "foil": false}, {"image_id": "COCO_val2014_000000520273", "CLIPScore": 0.73974609375, "foil": false}, {"image_id": "COCO_val2014_000000538394", "CLIPScore": 0.8642578125, "foil": true}, {"image_id": "COCO_val2014_000000387833", "CLIPScore": 0.74169921875, "foil": false}, {"image_id": "COCO_val2014_000000278321", "CLIPScore": 0.74462890625, "foil": false}, {"image_id": "COCO_val2014_000000412621", "CLIPScore": 0.83251953125, "foil": false}, {"image_id": "COCO_val2014_000000139623", "CLIPScore": 0.72265625, "foil": false}, {"image_id": "COCO_val2014_000000509577", "CLIPScore": 0.66357421875, "foil": true}, {"image_id": "COCO_val2014_000000422017", "CLIPScore": 0.8134765625, "foil": true}, {"image_id": "COCO_val2014_000000110231", "CLIPScore": 0.80126953125, "foil": true}, {"image_id": "COCO_val2014_000000117759", "CLIPScore": 0.865234375, "foil": false}, {"image_id": "COCO_val2014_000000083573", "CLIPScore": 0.8212890625, "foil": true}, {"image_id": "COCO_val2014_000000413043", "CLIPScore": 0.85546875, "foil": false}, {"image_id": "COCO_val2014_000000437564", "CLIPScore": 0.76416015625, "foil": true}, {"image_id": "COCO_val2014_000000490366", "CLIPScore": 0.81982421875, "foil": false}, {"image_id": "COCO_val2014_000000007207", "CLIPScore": 0.8125, "foil": true}, {"image_id": "COCO_val2014_000000455044", "CLIPScore": 0.9033203125, "foil": false}, {"image_id": "COCO_val2014_000000475043", "CLIPScore": 0.69873046875, "foil": false}, {"image_id": "COCO_val2014_000000041369", "CLIPScore": 0.71728515625, "foil": true}, {"image_id": "COCO_val2014_000000255149", "CLIPScore": 0.81005859375, "foil": true}, {"image_id": "COCO_val2014_000000066046", "CLIPScore": 0.96630859375, "foil": true}, {"image_id": "COCO_val2014_000000184613", "CLIPScore": 0.849609375, "foil": false}, {"image_id": "COCO_val2014_000000489550", "CLIPScore": 0.7783203125, "foil": false}, {"image_id": "COCO_val2014_000000309571", "CLIPScore": 0.662109375, "foil": true}, {"image_id": "COCO_val2014_000000516026", "CLIPScore": 0.88623046875, "foil": true}, {"image_id": "COCO_val2014_000000029444", "CLIPScore": 0.76416015625, "foil": true}, {"image_id": "COCO_val2014_000000019306", "CLIPScore": 0.8955078125, "foil": true}, {"image_id": "COCO_val2014_000000511236", "CLIPScore": 0.8828125, "foil": false}, {"image_id": "COCO_val2014_000000056302", "CLIPScore": 0.74658203125, "foil": true}, {"image_id": "COCO_val2014_000000512416", "CLIPScore": 0.849609375, "foil": true}, {"image_id": "COCO_val2014_000000258905", "CLIPScore": 0.69287109375, "foil": false}, {"image_id": "COCO_val2014_000000073622", "CLIPScore": 0.55126953125, "foil": false}, {"image_id": "COCO_val2014_000000469030", "CLIPScore": 0.79833984375, "foil": false}, {"image_id": "COCO_val2014_000000115069", "CLIPScore": 0.853515625, "foil": false}, {"image_id": "COCO_val2014_000000419560", "CLIPScore": 0.84375, "foil": false}, {"image_id": "COCO_val2014_000000354744", "CLIPScore": 0.81005859375, "foil": false}, {"image_id": "COCO_val2014_000000378244", "CLIPScore": 0.81005859375, "foil": false}, {"image_id": "COCO_val2014_000000527022", "CLIPScore": 0.69091796875, "foil": true}, {"image_id": "COCO_val2014_000000016161", "CLIPScore": 0.87451171875, "foil": false}, {"image_id": "COCO_val2014_000000569030", "CLIPScore": 0.80810546875, "foil": true}, {"image_id": "COCO_val2014_000000187852", "CLIPScore": 0.8388671875, "foil": false}, {"image_id": "COCO_val2014_000000100624", "CLIPScore": 0.9208984375, "foil": false}, {"image_id": "COCO_val2014_000000092771", "CLIPScore": 0.849609375, "foil": false}, {"image_id": "COCO_val2014_000000425870", "CLIPScore": 0.666015625, "foil": true}, {"image_id": "COCO_val2014_000000268229", "CLIPScore": 0.802734375, "foil": true}, {"image_id": "COCO_val2014_000000233848", "CLIPScore": 0.7607421875, "foil": false}, {"image_id": "COCO_val2014_000000011760", "CLIPScore": 0.89453125, "foil": false}, {"image_id": "COCO_val2014_000000249227", "CLIPScore": 0.744140625, "foil": false}, {"image_id": "COCO_val2014_000000046345", "CLIPScore": 0.74951171875, "foil": false}, {"image_id": "COCO_val2014_000000033697", "CLIPScore": 0.8173828125, "foil": false}, {"image_id": "COCO_val2014_000000097659", "CLIPScore": 0.87646484375, "foil": false}, {"image_id": "COCO_val2014_000000257137", "CLIPScore": 0.91552734375, "foil": false}, {"image_id": "COCO_val2014_000000413287", "CLIPScore": 0.9375, "foil": false}, {"image_id": "COCO_val2014_000000477750", "CLIPScore": 0.73046875, "foil": false}, {"image_id": "COCO_val2014_000000550432", "CLIPScore": 0.716796875, "foil": false}, {"image_id": "COCO_val2014_000000486905", "CLIPScore": 0.95458984375, "foil": false}, {"image_id": "COCO_val2014_000000352789", "CLIPScore": 0.84521484375, "foil": false}, {"image_id": "COCO_val2014_000000172649", "CLIPScore": 0.75439453125, "foil": false}, {"image_id": "COCO_val2014_000000101828", "CLIPScore": 0.8134765625, "foil": false}, {"image_id": "COCO_val2014_000000172553", "CLIPScore": 0.740234375, "foil": true}, {"image_id": "COCO_val2014_000000139917", "CLIPScore": 0.80126953125, "foil": true}, {"image_id": "COCO_val2014_000000395364", "CLIPScore": 0.67431640625, "foil": true}, {"image_id": "COCO_val2014_000000351133", "CLIPScore": 0.8603515625, "foil": false}, {"image_id": "COCO_val2014_000000548500", "CLIPScore": 0.7587890625, "foil": true}, {"image_id": "COCO_val2014_000000372070", "CLIPScore": 0.685546875, "foil": false}, {"image_id": "COCO_val2014_000000360772", "CLIPScore": 0.7001953125, "foil": false}, {"image_id": "COCO_val2014_000000024144", "CLIPScore": 0.86181640625, "foil": false}, {"image_id": "COCO_val2014_000000083573", "CLIPScore": 0.82470703125, "foil": false}, {"image_id": "COCO_val2014_000000318645", "CLIPScore": 0.7763671875, "foil": true}, {"image_id": "COCO_val2014_000000350668", "CLIPScore": 0.783203125, "foil": true}, {"image_id": "COCO_val2014_000000340559", "CLIPScore": 0.99560546875, "foil": true}, {"image_id": "COCO_val2014_000000081782", "CLIPScore": 0.79345703125, "foil": true}, {"image_id": "COCO_val2014_000000296404", "CLIPScore": 0.8134765625, "foil": false}, {"image_id": "COCO_val2014_000000220732", "CLIPScore": 0.72998046875, "foil": true}, {"image_id": "COCO_val2014_000000569415", "CLIPScore": 0.8447265625, "foil": false}, {"image_id": "COCO_val2014_000000117563", "CLIPScore": 0.67431640625, "foil": false}, {"image_id": "COCO_val2014_000000125208", "CLIPScore": 0.81591796875, "foil": true}, {"image_id": "COCO_val2014_000000030012", "CLIPScore": 0.6611328125, "foil": false}, {"image_id": "COCO_val2014_000000395463", "CLIPScore": 0.68115234375, "foil": false}, {"image_id": "COCO_val2014_000000389316", "CLIPScore": 0.74462890625, "foil": true}, {"image_id": "COCO_val2014_000000255769", "CLIPScore": 0.740234375, "foil": false}, {"image_id": "COCO_val2014_000000031748", "CLIPScore": 0.7490234375, "foil": false}, {"image_id": "COCO_val2014_000000297374", "CLIPScore": 0.81787109375, "foil": false}, {"image_id": "COCO_val2014_000000310902", "CLIPScore": 0.74755859375, "foil": false}, {"image_id": "COCO_val2014_000000014248", "CLIPScore": 0.8203125, "foil": false}, {"image_id": "COCO_val2014_000000444491", "CLIPScore": 0.77099609375, "foil": true}, {"image_id": "COCO_val2014_000000474465", "CLIPScore": 0.8701171875, "foil": false}, {"image_id": "COCO_val2014_000000049682", "CLIPScore": 0.8095703125, "foil": false}, {"image_id": "COCO_val2014_000000139917", "CLIPScore": 0.8232421875, "foil": false}, {"image_id": "COCO_val2014_000000495376", "CLIPScore": 0.75146484375, "foil": false}, {"image_id": "COCO_val2014_000000559277", "CLIPScore": 0.837890625, "foil": false}, {"image_id": "COCO_val2014_000000360182", "CLIPScore": 0.822265625, "foil": true}, {"image_id": "COCO_val2014_000000120860", "CLIPScore": 0.8369140625, "foil": true}, {"image_id": "COCO_val2014_000000226592", "CLIPScore": 0.8740234375, "foil": false}, {"image_id": "COCO_val2014_000000233005", "CLIPScore": 0.63720703125, "foil": true}, {"image_id": "COCO_val2014_000000468736", "CLIPScore": 0.861328125, "foil": false}, {"image_id": "COCO_val2014_000000034869", "CLIPScore": 0.7509765625, "foil": false}, {"image_id": "COCO_val2014_000000179045", "CLIPScore": 0.6728515625, "foil": false}, {"image_id": "COCO_val2014_000000136846", "CLIPScore": 0.7109375, "foil": false}, {"image_id": "COCO_val2014_000000189213", "CLIPScore": 0.7705078125, "foil": false}, {"image_id": "COCO_val2014_000000435358", "CLIPScore": 0.759765625, "foil": false}, {"image_id": "COCO_val2014_000000207056", "CLIPScore": 0.9130859375, "foil": false}, {"image_id": "COCO_val2014_000000276146", "CLIPScore": 0.82275390625, "foil": false}, {"image_id": "COCO_val2014_000000251627", "CLIPScore": 0.703125, "foil": true}, {"image_id": "COCO_val2014_000000332113", "CLIPScore": 0.7939453125, "foil": true}, {"image_id": "COCO_val2014_000000560993", "CLIPScore": 0.9111328125, "foil": false}, {"image_id": "COCO_val2014_000000217827", "CLIPScore": 0.75244140625, "foil": true}, {"image_id": "COCO_val2014_000000186009", "CLIPScore": 0.85400390625, "foil": false}, {"image_id": "COCO_val2014_000000327436", "CLIPScore": 0.7041015625, "foil": false}, {"image_id": "COCO_val2014_000000419309", "CLIPScore": 0.615234375, "foil": false}, {"image_id": "COCO_val2014_000000518914", "CLIPScore": 0.7744140625, "foil": false}, {"image_id": "COCO_val2014_000000226097", "CLIPScore": 0.794921875, "foil": true}, {"image_id": "COCO_val2014_000000004108", "CLIPScore": 0.6064453125, "foil": true}, {"image_id": "COCO_val2014_000000282150", "CLIPScore": 0.87451171875, "foil": false}, {"image_id": "COCO_val2014_000000149197", "CLIPScore": 0.81494140625, "foil": false}, {"image_id": "COCO_val2014_000000232654", "CLIPScore": 0.7763671875, "foil": true}, {"image_id": "COCO_val2014_000000147173", "CLIPScore": 0.78662109375, "foil": true}, {"image_id": "COCO_val2014_000000211743", "CLIPScore": 0.78369140625, "foil": true}, {"image_id": "COCO_val2014_000000455610", "CLIPScore": 0.67578125, "foil": false}, {"image_id": "COCO_val2014_000000358642", "CLIPScore": 0.76904296875, "foil": true}, {"image_id": "COCO_val2014_000000218470", "CLIPScore": 0.716796875, "foil": false}, {"image_id": "COCO_val2014_000000157767", "CLIPScore": 0.6201171875, "foil": true}, {"image_id": "COCO_val2014_000000234676", "CLIPScore": 0.7314453125, "foil": false}, {"image_id": "COCO_val2014_000000239355", "CLIPScore": 0.7470703125, "foil": true}, {"image_id": "COCO_val2014_000000327918", "CLIPScore": 0.68798828125, "foil": false}, {"image_id": "COCO_val2014_000000044621", "CLIPScore": 0.7265625, "foil": true}, {"image_id": "COCO_val2014_000000017655", "CLIPScore": 0.8388671875, "foil": false}, {"image_id": "COCO_val2014_000000005124", "CLIPScore": 0.810546875, "foil": false}, {"image_id": "COCO_val2014_000000029573", "CLIPScore": 0.744140625, "foil": false}, {"image_id": "COCO_val2014_000000258402", "CLIPScore": 0.8251953125, "foil": true}, {"image_id": "COCO_val2014_000000056288", "CLIPScore": 0.92529296875, "foil": false}, {"image_id": "COCO_val2014_000000273825", "CLIPScore": 0.7333984375, "foil": false}, {"image_id": "COCO_val2014_000000076619", "CLIPScore": 0.74609375, "foil": true}, {"image_id": "COCO_val2014_000000532481", "CLIPScore": 0.80078125, "foil": false}, {"image_id": "COCO_val2014_000000509867", "CLIPScore": 0.7119140625, "foil": true}, {"image_id": "COCO_val2014_000000255338", "CLIPScore": 0.83056640625, "foil": false}, {"image_id": "COCO_val2014_000000125850", "CLIPScore": 0.734375, "foil": true}, {"image_id": "COCO_val2014_000000131593", "CLIPScore": 0.703125, "foil": true}, {"image_id": "COCO_val2014_000000564629", "CLIPScore": 0.759765625, "foil": false}, {"image_id": "COCO_val2014_000000268092", "CLIPScore": 0.8720703125, "foil": true}, {"image_id": "COCO_val2014_000000441468", "CLIPScore": 0.75634765625, "foil": true}, {"image_id": "COCO_val2014_000000548957", "CLIPScore": 0.654296875, "foil": false}, {"image_id": "COCO_val2014_000000203878", "CLIPScore": 0.7548828125, "foil": false}, {"image_id": "COCO_val2014_000000423256", "CLIPScore": 0.767578125, "foil": true}, {"image_id": "COCO_val2014_000000519094", "CLIPScore": 0.8330078125, "foil": false}, {"image_id": "COCO_val2014_000000061773", "CLIPScore": 0.6796875, "foil": false}, {"image_id": "COCO_val2014_000000466787", "CLIPScore": 0.7998046875, "foil": false}, {"image_id": "COCO_val2014_000000337533", "CLIPScore": 0.833984375, "foil": false}, {"image_id": "COCO_val2014_000000412586", "CLIPScore": 0.8193359375, "foil": false}, {"image_id": "COCO_val2014_000000293071", "CLIPScore": 0.89306640625, "foil": false}, {"image_id": "COCO_val2014_000000304305", "CLIPScore": 0.81298828125, "foil": false}, {"image_id": "COCO_val2014_000000483893", "CLIPScore": 0.76123046875, "foil": true}, {"image_id": "COCO_val2014_000000399164", "CLIPScore": 0.7744140625, "foil": true}, {"image_id": "COCO_val2014_000000435309", "CLIPScore": 0.662109375, "foil": false}, {"image_id": "COCO_val2014_000000142000", "CLIPScore": 0.76171875, "foil": true}]

And then I run this snippet, which is not provided in this aloha repo.

import json
import pandas as pd
from sklearn.metrics import average_precision_score

with open(output, "r") as f:
    data = json.load(f)
df = pd.DataFrame(data)
average_precision_score(df["foil"], -df["CLIPScore"]) # small CLIPScore indicates more likely to FOIL 

and it returns 38.97 while Table 1 CLIPScore AP shows 40.10. Can you help which part I missed?

DavidMChan commented 4 months ago

Interesting - does this change if you compute AP as described in appendix B.1 in the paper? I don't think that sklearn is using the same process. @spetryk - do you have any thoughts on this?

long8v commented 4 months ago

I am confused.. If I follow appendix B.1., it returns 0.0733

image
ap = 0
for sample in output:
    if sample["foil"]: # .. positive label (1) to be “hallucination” .. in appendix
        ap += (1 - sample["CLIPScore"])
print(ap / len(output))

0.07338134765625

but in the next paragraph it seems that it follows standard AP metric, which is assumed to be same metric as sklearn implementation.

image

Can you share snippet for calculating AP in paper? I assume three possibilities can happen 1) AP score I measured is inaccurate. 2) something I missed in CLIPScore metric. I compare with CLIPScore repo vs aloha/src/aloha/metrics/clipscore.py and found no difference to make change in output. 3) HAT dataset in this repo is not same set with dataset used in a paper?

DavidMChan commented 4 months ago

The AP calculation above is slightly wrong, since ALOHa is inverted to CLIPScore (i.e. for ALOHa you need 1-, while for CLIPScore it should just be the raw value). That being said, it shouldn't matter that much, since the scores should be similar to sklearn. Are the other measures that you are generating similar to the paper results?

Looking back at our experimental results, it seems like something is different in the code that is being used, since for our code, we got a CLIPScore for image COCO_val2014_000000023709.jpg of 0.73291015625, instead of 0.70068359375.

I looked a bit at the committed code, and it looks like this might be the culprit: https://github.com/DavidMChan/aloha/blob/e38d69e0004a044254cef2641985c7ae4e01efd4/src/aloha/metrics/clipscore.py#L175C1-L176C29

Can you try changing this to the "ViT-B/32" version of CLIP and see if you get the higher scores?

long8v commented 4 months ago

The AP calculation above is slightly wrong, since ALOHa is inverted to CLIPScore (i.e. for ALOHa you need 1-, while for CLIPScore it should just be the raw value)

I believe CLIPScore is also need to be inverted since CLIPScore means how they are aligned, so caption with FOIL should assign lower score in CLIPScore. Can you provide AP calculation snippet used in paper so do I exactly reproduce?

Looking back at our experimental results, it seems like something is different in the code that is being used, since for our code, we got a CLIPScore for image COCO_val2014_000000023709.jpg of 0.73291015625, instead of 0.70068359375.

I did not use "RN50x64" but "ViT-B/32", since I used CLIPScore repo not this ALOHa repo. I compared CLIPScore repo vs aloha/src/aloha/metrics/clipscore.py and found no difference to make change in output. Do you have any other guesses? 👀

long8v commented 4 months ago

I think it might be from environment :/ I change environment to below, and COCO_val2014_000000023709.jpg retuns 0.7119140625

torch==1.7.1
torchvision==0.8.2
numpy==1.20.3
scikit-learn==0.23.1

(I referred to initial commit https://github.com/openai/CLIP/commit/3bee28119e6b28e75b82b811b87b56935314e6a5) However, it still differs from your report(0.73291015625), so it would be so helpful to have environment(torch, torchvision, numpy) you used.


[{"image_id": "COCO_val2014_000000016903", "CLIPScore": 0.8759765625, "foil": true}, {"image_id": "COCO_val2014_000000023709", "CLIPScore": 0.7119140625, "foil": false}, {"image_id": "COCO_val2014_000000553561", "CLIPScore": 0.90380859375, "foil": false}, {"image_id": "COCO_val2014_000000090367", "CLIPScore": 0.8720703125, "foil": false}, {"image_id": "COCO_val2014_000000539226", "CLIPScore": 0.6904296875, "foil": false}, {"image_id": "COCO_val2014_000000122838", "CLIPScore": 0.69921875, "foil": false}, {"image_id": "COCO_val2014_000000450577", "CLIPScore": 0.82470703125, "foil": false}, {"image_id": "COCO_val2014_000000196660", "CLIPScore": 0.7900390625, "foil": false}, {"image_id": "COCO_val2014_000000089541", "CLIPScore": 0.97216796875, "foil": false}, {"image_id": "COCO_val2014_000000228013", "CLIPScore": 0.79541015625, "foil": false}, {"image_id": "COCO_val2014_000000226579", "CLIPScore": 0.8583984375, "foil": false}, {"image_id": "COCO_val2014_000000464689", "CLIPScore": 0.7802734375, "foil": true}, {"image_id": "COCO_val2014_000000536292", "CLIPScore": 0.8564453125, "foil": false}, {"image_id": "COCO_val2014_000000331799", "CLIPScore": 0.71337890625, "foil": true}, {"image_id": "COCO_val2014_000000266491", "CLIPScore": 0.76171875, "foil": true}, {"image_id": "COCO_val2014_000000570594", "CLIPScore": 0.62451171875, "foil": false}, {"image_id": "COCO_val2014_000000481710", "CLIPScore": 0.7861328125, "foil": false}, {"image_id": "COCO_val2014_000000461953", "CLIPScore": 0.837890625, "foil": false}, {"image_id": "COCO_val2014_000000206751", "CLIPScore": 0.80859375, "foil": true}, {"image_id": "COCO_val2014_000000218205", "CLIPScore": 0.7021484375, "foil": false}, {"image_id": "COCO_val2014_000000016161", "CLIPScore": 0.91552734375, "foil": false}, {"image_id": "COCO_val2014_000000134103", "CLIPScore": 0.76708984375, "foil": false}, {"image_id": "COCO_val2014_000000103870", "CLIPScore": 0.87646484375, "foil": true}, {"image_id": "COCO_val2014_000000491154", "CLIPScore": 0.9189453125, "foil": false}, {"image_id": "COCO_val2014_000000538721", "CLIPScore": 0.6875, "foil": true}, {"image_id": "COCO_val2014_000000234676", "CLIPScore": 0.70068359375, "foil": false}, {"image_id": "COCO_val2014_000000382512", "CLIPScore": 0.826171875, "foil": true}, {"image_id": "COCO_val2014_000000006701", "CLIPScore": 0.779296875, "foil": false}, {"image_id": "COCO_val2014_000000333190", "CLIPScore": 0.76416015625, "foil": true}, {"image_id": "COCO_val2014_000000050753", "CLIPScore": 0.8134765625, "foil": false}, {"image_id": "COCO_val2014_000000345469", "CLIPScore": 0.8994140625, "foil": false}, {"image_id": "COCO_val2014_000000489023", "CLIPScore": 0.66015625, "foil": false}, {"image_id": "COCO_val2014_000000221725", "CLIPScore": 0.818359375, "foil": false}, {"image_id": "COCO_val2014_000000535997", "CLIPScore": 0.69091796875, "foil": false}, {"image_id": "COCO_val2014_000000367429", "CLIPScore": 0.8798828125, "foil": false}, {"image_id": "COCO_val2014_000000411587", "CLIPScore": 0.86376953125, "foil": false}, {"image_id": "COCO_val2014_000000578703", "CLIPScore": 0.77099609375, "foil": true}, {"image_id": "COCO_val2014_000000101280", "CLIPScore": 0.80126953125, "foil": true}, {"image_id": "COCO_val2014_000000577310", "CLIPScore": 0.826171875, "foil": false}, {"image_id": "COCO_val2014_000000167656", "CLIPScore": 0.6572265625, "foil": false}, {"image_id": "COCO_val2014_000000209835", "CLIPScore": 0.740234375, "foil": false}, {"image_id": "COCO_val2014_000000261116", "CLIPScore": 0.86474609375, "foil": true}, {"image_id": "COCO_val2014_000000224037", "CLIPScore": 0.69384765625, "foil": false}, {"image_id": "COCO_val2014_000000183407", "CLIPScore": 0.779296875, "foil": false}, {"image_id": "COCO_val2014_000000347675", "CLIPScore": 0.7490234375, "foil": true}, {"image_id": "COCO_val2014_000000280918", "CLIPScore": 0.9140625, "foil": false}, {"image_id": "COCO_val2014_000000083113", "CLIPScore": 0.8642578125, "foil": false}, {"image_id": "COCO_val2014_000000010432", "CLIPScore": 0.8544921875, "foil": true}, {"image_id": "COCO_val2014_000000173574", "CLIPScore": 0.71044921875, "foil": true}, {"image_id": "COCO_val2014_000000561214", "CLIPScore": 0.75439453125, "foil": true}, {"image_id": "COCO_val2014_000000227901", "CLIPScore": 0.71337890625, "foil": true}, {"image_id": "COCO_val2014_000000227960", "CLIPScore": 0.8193359375, "foil": false}, {"image_id": "COCO_val2014_000000466960", "CLIPScore": 0.66015625, "foil": false}, {"image_id": "COCO_val2014_000000245852", "CLIPScore": 0.806640625, "foil": false}, {"image_id": "COCO_val2014_000000129592", "CLIPScore": 0.8994140625, "foil": false}, {"image_id": "COCO_val2014_000000555648", "CLIPScore": 0.7490234375, "foil": false}, {"image_id": "COCO_val2014_000000229599", "CLIPScore": 0.8623046875, "foil": false}, {"image_id": "COCO_val2014_000000082465", "CLIPScore": 0.76220703125, "foil": true}, {"image_id": "COCO_val2014_000000249672", "CLIPScore": 0.8046875, "foil": false}, {"image_id": "COCO_val2014_000000441211", "CLIPScore": 0.744140625, "foil": false}, {"image_id": "COCO_val2014_000000481670", "CLIPScore": 0.75927734375, "foil": false}, {"image_id": "COCO_val2014_000000304741", "CLIPScore": 0.9541015625, "foil": true}, {"image_id": "COCO_val2014_000000534045", "CLIPScore": 0.87841796875, "foil": true}, {"image_id": "COCO_val2014_000000514586", "CLIPScore": 0.83544921875, "foil": true}, {"image_id": "COCO_val2014_000000523252", "CLIPScore": 0.75732421875, "foil": true}, {"image_id": "COCO_val2014_000000201301", "CLIPScore": 0.8974609375, "foil": true}, {"image_id": "COCO_val2014_000000191981", "CLIPScore": 0.728515625, "foil": false}, {"image_id": "COCO_val2014_000000179317", "CLIPScore": 0.83740234375, "foil": true}, {"image_id": "COCO_val2014_000000492800", "CLIPScore": 0.6865234375, "foil": true}, {"image_id": "COCO_val2014_000000077595", "CLIPScore": 0.8271484375, "foil": true}, {"image_id": "COCO_val2014_000000196594", "CLIPScore": 0.67626953125, "foil": true}, {"image_id": "COCO_val2014_000000000139", "CLIPScore": 0.70751953125, "foil": false}, {"image_id": "COCO_val2014_000000377832", "CLIPScore": 0.8564453125, "foil": true}, {"image_id": "COCO_val2014_000000018737", "CLIPScore": 0.826171875, "foil": false}, {"image_id": "COCO_val2014_000000212470", "CLIPScore": 0.892578125, "foil": true}, {"image_id": "COCO_val2014_000000356261", "CLIPScore": 0.88818359375, "foil": false}, {"image_id": "COCO_val2014_000000128570", "CLIPScore": 0.93310546875, "foil": false}, {"image_id": "COCO_val2014_000000007320", "CLIPScore": 0.779296875, "foil": false}, {"image_id": "COCO_val2014_000000392928", "CLIPScore": 0.818359375, "foil": false}, {"image_id": "COCO_val2014_000000066046", "CLIPScore": 0.94482421875, "foil": false}, {"image_id": "COCO_val2014_000000253282", "CLIPScore": 0.806640625, "foil": false}, {"image_id": "COCO_val2014_000000296303", "CLIPScore": 0.7265625, "foil": false}, {"image_id": "COCO_val2014_000000574592", "CLIPScore": 0.890625, "foil": false}, {"image_id": "COCO_val2014_000000273825", "CLIPScore": 0.8212890625, "foil": false}, {"image_id": "COCO_val2014_000000027805", "CLIPScore": 0.8544921875, "foil": false}, {"image_id": "COCO_val2014_000000236272", "CLIPScore": 0.68310546875, "foil": true}, {"image_id": "COCO_val2014_000000433998", "CLIPScore": 0.8515625, "foil": false}, {"image_id": "COCO_val2014_000000497141", "CLIPScore": 0.9169921875, "foil": false}, {"image_id": "COCO_val2014_000000518188", "CLIPScore": 0.7060546875, "foil": true}, {"image_id": "COCO_val2014_000000514979", "CLIPScore": 0.794921875, "foil": false}, {"image_id": "COCO_val2014_000000319687", "CLIPScore": 0.7412109375, "foil": false}, {"image_id": "COCO_val2014_000000261758", "CLIPScore": 0.8134765625, "foil": false}, {"image_id": "COCO_val2014_000000336568", "CLIPScore": 0.83447265625, "foil": false}, {"image_id": "COCO_val2014_000000028864", "CLIPScore": 0.888671875, "foil": false}, {"image_id": "COCO_val2014_000000566049", "CLIPScore": 0.83056640625, "foil": false}, {"image_id": "COCO_val2014_000000117676", "CLIPScore": 0.74951171875, "foil": false}, {"image_id": "COCO_val2014_000000128813", "CLIPScore": 0.88623046875, "foil": false}, {"image_id": "COCO_val2014_000000190432", "CLIPScore": 0.86865234375, "foil": false}, {"image_id": "COCO_val2014_000000101660", "CLIPScore": 0.8193359375, "foil": true}, {"image_id": "COCO_val2014_000000463785", "CLIPScore": 0.79052734375, "foil": false}, {"image_id": "COCO_val2014_000000410141", "CLIPScore": 0.6796875, "foil": false}, {"image_id": "COCO_val2014_000000237041", "CLIPScore": 0.802734375, "foil": false}, {"image_id": "COCO_val2014_000000443347", "CLIPScore": 0.802734375, "foil": false}, {"image_id": "COCO_val2014_000000276720", "CLIPScore": 0.6845703125, "foil": false}, {"image_id": "COCO_val2014_000000028850", "CLIPScore": 0.7099609375, "foil": false}, {"image_id": "COCO_val2014_000000500940", "CLIPScore": 0.87890625, "foil": false}, {"image_id": "COCO_val2014_000000314412", "CLIPScore": 0.88916015625, "foil": false}, {"image_id": "COCO_val2014_000000172201", "CLIPScore": 0.79150390625, "foil": false}, {"image_id": "COCO_val2014_000000232598", "CLIPScore": 0.8427734375, "foil": false}, {"image_id": "COCO_val2014_000000113113", "CLIPScore": 0.763671875, "foil": false}, {"image_id": "COCO_val2014_000000483401", "CLIPScore": 0.84228515625, "foil": false}, {"image_id": "COCO_val2014_000000032258", "CLIPScore": 0.779296875, "foil": false}, {"image_id": "COCO_val2014_000000158887", "CLIPScore": 0.90087890625, "foil": false}, {"image_id": "COCO_val2014_000000258523", "CLIPScore": 0.802734375, "foil": false}, {"image_id": "COCO_val2014_000000439770", "CLIPScore": 0.796875, "foil": true}, {"image_id": "COCO_val2014_000000217301", "CLIPScore": 0.89794921875, "foil": true}, {"image_id": "COCO_val2014_000000192905", "CLIPScore": 0.845703125, "foil": true}, {"image_id": "COCO_val2014_000000363577", "CLIPScore": 0.880859375, "foil": true}, {"image_id": "COCO_val2014_000000149568", "CLIPScore": 0.7060546875, "foil": true}, {"image_id": "COCO_val2014_000000127660", "CLIPScore": 0.91357421875, "foil": false}, {"image_id": "COCO_val2014_000000299493", "CLIPScore": 0.8154296875, "foil": false}, {"image_id": "COCO_val2014_000000293757", "CLIPScore": 0.82275390625, "foil": true}, {"image_id": "COCO_val2014_000000386912", "CLIPScore": 0.7099609375, "foil": false}, {"image_id": "COCO_val2014_000000451084", "CLIPScore": 0.814453125, "foil": false}, {"image_id": "COCO_val2014_000000376545", "CLIPScore": 0.794921875, "foil": false}, {"image_id": "COCO_val2014_000000327401", "CLIPScore": 0.81787109375, "foil": true}, {"image_id": "COCO_val2014_000000562614", "CLIPScore": 0.81982421875, "foil": false}, {"image_id": "COCO_val2014_000000366264", "CLIPScore": 0.6669921875, "foil": false}, {"image_id": "COCO_val2014_000000036450", "CLIPScore": 0.80810546875, "foil": false}, {"image_id": "COCO_val2014_000000202825", "CLIPScore": 0.89599609375, "foil": true}, {"image_id": "COCO_val2014_000000308506", "CLIPScore": 0.7373046875, "foil": false}, {"image_id": "COCO_val2014_000000511469", "CLIPScore": 0.88818359375, "foil": false}, {"image_id": "COCO_val2014_000000264191", "CLIPScore": 0.77099609375, "foil": false}, {"image_id": "COCO_val2014_000000528276", "CLIPScore": 0.87939453125, "foil": true}, {"image_id": "COCO_val2014_000000375415", "CLIPScore": 0.900390625, "foil": true}, {"image_id": "COCO_val2014_000000095677", "CLIPScore": 0.9755859375, "foil": false}, {"image_id": "COCO_val2014_000000043997", "CLIPScore": 0.8798828125, "foil": false}, {"image_id": "COCO_val2014_000000102577", "CLIPScore": 0.78369140625, "foil": false}, {"image_id": "COCO_val2014_000000139917", "CLIPScore": 0.98193359375, "foil": false}, {"image_id": "COCO_val2014_000000207561", "CLIPScore": 0.830078125, "foil": false}, {"image_id": "COCO_val2014_000000169331", "CLIPScore": 0.76953125, "foil": true}, {"image_id": "COCO_val2014_000000375415", "CLIPScore": 0.86181640625, "foil": false}, {"image_id": "COCO_val2014_000000538463", "CLIPScore": 0.7509765625, "foil": false}, {"image_id": "COCO_val2014_000000323925", "CLIPScore": 0.767578125, "foil": false}, {"image_id": "COCO_val2014_000000091615", "CLIPScore": 0.8037109375, "foil": false}, {"image_id": "COCO_val2014_000000543692", "CLIPScore": 0.783203125, "foil": false}, {"image_id": "COCO_val2014_000000362023", "CLIPScore": 0.69775390625, "foil": true}, {"image_id": "COCO_val2014_000000331250", "CLIPScore": 0.71728515625, "foil": true}, {"image_id": "COCO_val2014_000000528786", "CLIPScore": 0.78369140625, "foil": false}, {"image_id": "COCO_val2014_000000134596", "CLIPScore": 0.595703125, "foil": true}, {"image_id": "COCO_val2014_000000455741", "CLIPScore": 0.71826171875, "foil": true}, {"image_id": "COCO_val2014_000000431573", "CLIPScore": 0.94775390625, "foil": false}, {"image_id": "COCO_val2014_000000552901", "CLIPScore": 0.828125, "foil": false}, {"image_id": "COCO_val2014_000000050165", "CLIPScore": 0.791015625, "foil": true}, {"image_id": "COCO_val2014_000000473299", "CLIPScore": 0.80029296875, "foil": true}, {"image_id": "COCO_val2014_000000245145", "CLIPScore": 0.72705078125, "foil": true}, {"image_id": "COCO_val2014_000000004840", "CLIPScore": 0.7705078125, "foil": false}, {"image_id": "COCO_val2014_000000125208", "CLIPScore": 0.88623046875, "foil": false}, {"image_id": "COCO_val2014_000000515585", "CLIPScore": 0.814453125, "foil": true}, {"image_id": "COCO_val2014_000000322056", "CLIPScore": 0.7861328125, "foil": false}, {"image_id": "COCO_val2014_000000557172", "CLIPScore": 0.8291015625, "foil": false}, {"image_id": "COCO_val2014_000000169226", "CLIPScore": 0.80859375, "foil": false}, {"image_id": "COCO_val2014_000000290416", "CLIPScore": 0.7529296875, "foil": false}, {"image_id": "COCO_val2014_000000551633", "CLIPScore": 0.7822265625, "foil": true}, {"image_id": "COCO_val2014_000000311789", "CLIPScore": 0.783203125, "foil": true}, {"image_id": "COCO_val2014_000000208135", "CLIPScore": 0.7578125, "foil": false}, {"image_id": "COCO_val2014_000000137954", "CLIPScore": 0.77734375, "foil": false}, {"image_id": "COCO_val2014_000000091267", "CLIPScore": 0.81103515625, "foil": false}, {"image_id": "COCO_val2014_000000304741", "CLIPScore": 0.74658203125, "foil": false}, {"image_id": "COCO_val2014_000000460841", "CLIPScore": 0.837890625, "foil": true}, {"image_id": "COCO_val2014_000000291028", "CLIPScore": 0.8017578125, "foil": false}, {"image_id": "COCO_val2014_000000439290", "CLIPScore": 0.79150390625, "foil": false}, {"image_id": "COCO_val2014_000000441468", "CLIPScore": 0.80126953125, "foil": false}, {"image_id": "COCO_val2014_000000543570", "CLIPScore": 0.8115234375, "foil": false}, {"image_id": "COCO_val2014_000000472472", "CLIPScore": 0.80810546875, "foil": false}, {"image_id": "COCO_val2014_000000094379", "CLIPScore": 0.84033203125, "foil": true}, {"image_id": "COCO_val2014_000000381519", "CLIPScore": 0.83056640625, "foil": false}, {"image_id": "COCO_val2014_000000325958", "CLIPScore": 0.72802734375, "foil": true}, {"image_id": "COCO_val2014_000000109216", "CLIPScore": 0.8408203125, "foil": true}, {"image_id": "COCO_val2014_000000199510", "CLIPScore": 0.802734375, "foil": true}, {"image_id": "COCO_val2014_000000434829", "CLIPScore": 0.6845703125, "foil": false}, {"image_id": "COCO_val2014_000000066263", "CLIPScore": 0.8994140625, "foil": false}, {"image_id": "COCO_val2014_000000190760", "CLIPScore": 0.99072265625, "foil": false}, {"image_id": "COCO_val2014_000000229216", "CLIPScore": 0.833984375, "foil": true}, {"image_id": "COCO_val2014_000000429598", "CLIPScore": 0.8154296875, "foil": true}, {"image_id": "COCO_val2014_000000021232", "CLIPScore": 0.8115234375, "foil": false}, {"image_id": "COCO_val2014_000000130599", "CLIPScore": 0.7021484375, "foil": true}, {"image_id": "COCO_val2014_000000065306", "CLIPScore": 0.8896484375, "foil": false}, {"image_id": "COCO_val2014_000000547487", "CLIPScore": 0.85986328125, "foil": false}, {"image_id": "COCO_val2014_000000358149", "CLIPScore": 0.61474609375, "foil": false}, {"image_id": "COCO_val2014_000000017959", "CLIPScore": 0.923828125, "foil": true}, {"image_id": "COCO_val2014_000000310902", "CLIPScore": 0.8017578125, "foil": false}, {"image_id": "COCO_val2014_000000160004", "CLIPScore": 0.83056640625, "foil": true}, {"image_id": "COCO_val2014_000000538064", "CLIPScore": 0.8193359375, "foil": false}, {"image_id": "COCO_val2014_000000125997", "CLIPScore": 1.04296875, "foil": false}, {"image_id": "COCO_val2014_000000002255", "CLIPScore": 0.853515625, "foil": true}, {"image_id": "COCO_val2014_000000153734", "CLIPScore": 0.837890625, "foil": false}, {"image_id": "COCO_val2014_000000371243", "CLIPScore": 0.88134765625, "foil": false}, {"image_id": "COCO_val2014_000000544237", "CLIPScore": 0.7001953125, "foil": false}, {"image_id": "COCO_val2014_000000002495", "CLIPScore": 0.853515625, "foil": false}, {"image_id": "COCO_val2014_000000498381", "CLIPScore": 0.92041015625, "foil": false}, {"image_id": "COCO_val2014_000000541550", "CLIPScore": 0.85205078125, "foil": true}, {"image_id": "COCO_val2014_000000303926", "CLIPScore": 0.83203125, "foil": true}, {"image_id": "COCO_val2014_000000115776", "CLIPScore": 0.71240234375, "foil": false}, {"image_id": "COCO_val2014_000000388927", "CLIPScore": 0.650390625, "foil": false}, {"image_id": "COCO_val2014_000000299987", "CLIPScore": 0.92578125, "foil": false}, {"image_id": "COCO_val2014_000000058225", "CLIPScore": 0.7841796875, "foil": false}, {"image_id": "COCO_val2014_000000501494", "CLIPScore": 0.8818359375, "foil": false}, {"image_id": "COCO_val2014_000000457453", "CLIPScore": 0.7705078125, "foil": false}, {"image_id": "COCO_val2014_000000114871", "CLIPScore": 0.87109375, "foil": false}, {"image_id": "COCO_val2014_000000005728", "CLIPScore": 0.97607421875, "foil": true}, {"image_id": "COCO_val2014_000000579602", "CLIPScore": 0.7822265625, "foil": true}, {"image_id": "COCO_val2014_000000322509", "CLIPScore": 0.87109375, "foil": false}, {"image_id": "COCO_val2014_000000461573", "CLIPScore": 0.77197265625, "foil": true}, {"image_id": "COCO_val2014_000000135155", "CLIPScore": 0.79345703125, "foil": false}, {"image_id": "COCO_val2014_000000249658", "CLIPScore": 0.76708984375, "foil": false}, {"image_id": "COCO_val2014_000000004678", "CLIPScore": 0.76708984375, "foil": false}, {"image_id": "COCO_val2014_000000079331", "CLIPScore": 0.93994140625, "foil": false}, {"image_id": "COCO_val2014_000000255769", "CLIPScore": 0.77392578125, "foil": false}, {"image_id": "COCO_val2014_000000002495", "CLIPScore": 0.814453125, "foil": true}, {"image_id": "COCO_val2014_000000342593", "CLIPScore": 0.955078125, "foil": false}, {"image_id": "COCO_val2014_000000257328", "CLIPScore": 0.8359375, "foil": true}, {"image_id": "COCO_val2014_000000451275", "CLIPScore": 0.8173828125, "foil": false}, {"image_id": "COCO_val2014_000000110265", "CLIPScore": 0.76708984375, "foil": false}, {"image_id": "COCO_val2014_000000121014", "CLIPScore": 0.70361328125, "foil": true}, {"image_id": "COCO_val2014_000000386032", "CLIPScore": 0.85009765625, "foil": false}, {"image_id": "COCO_val2014_000000138639", "CLIPScore": 0.7890625, "foil": true}, {"image_id": "COCO_val2014_000000380487", "CLIPScore": 0.75, "foil": false}, {"image_id": "COCO_val2014_000000221571", "CLIPScore": 0.9033203125, "foil": false}, {"image_id": "COCO_val2014_000000337984", "CLIPScore": 0.783203125, "foil": false}, {"image_id": "COCO_val2014_000000012959", "CLIPScore": 0.91259765625, "foil": false}, {"image_id": "COCO_val2014_000000514979", "CLIPScore": 0.7802734375, "foil": false}, {"image_id": "COCO_val2014_000000199688", "CLIPScore": 0.78662109375, "foil": false}, {"image_id": "COCO_val2014_000000575174", "CLIPScore": 0.78173828125, "foil": false}, {"image_id": "COCO_val2014_000000440528", "CLIPScore": 0.740234375, "foil": false}, {"image_id": "COCO_val2014_000000564355", "CLIPScore": 0.7822265625, "foil": true}, {"image_id": "COCO_val2014_000000351875", "CLIPScore": 0.705078125, "foil": false}, {"image_id": "COCO_val2014_000000437049", "CLIPScore": 0.78564453125, "foil": false}, {"image_id": "COCO_val2014_000000543409", "CLIPScore": 0.78076171875, "foil": true}, {"image_id": "COCO_val2014_000000198163", "CLIPScore": 0.67626953125, "foil": true}, {"image_id": "COCO_val2014_000000158583", "CLIPScore": 0.6337890625, "foil": true}, {"image_id": "COCO_val2014_000000124390", "CLIPScore": 0.80322265625, "foil": false}, {"image_id": "COCO_val2014_000000192192", "CLIPScore": 0.828125, "foil": false}, {"image_id": "COCO_val2014_000000155192", "CLIPScore": 0.79345703125, "foil": false}, {"image_id": "COCO_val2014_000000279386", "CLIPScore": 0.88916015625, "foil": false}, {"image_id": "COCO_val2014_000000407826", "CLIPScore": 0.79150390625, "foil": false}, {"image_id": "COCO_val2014_000000520273", "CLIPScore": 0.7392578125, "foil": false}, {"image_id": "COCO_val2014_000000538394", "CLIPScore": 0.85986328125, "foil": true}, {"image_id": "COCO_val2014_000000387833", "CLIPScore": 0.74462890625, "foil": false}, {"image_id": "COCO_val2014_000000278321", "CLIPScore": 0.73779296875, "foil": false}, {"image_id": "COCO_val2014_000000412621", "CLIPScore": 0.8359375, "foil": false}, {"image_id": "COCO_val2014_000000139623", "CLIPScore": 0.744140625, "foil": false}, {"image_id": "COCO_val2014_000000509577", "CLIPScore": 0.68310546875, "foil": true}, {"image_id": "COCO_val2014_000000422017", "CLIPScore": 0.8271484375, "foil": true}, {"image_id": "COCO_val2014_000000110231", "CLIPScore": 0.787109375, "foil": true}, {"image_id": "COCO_val2014_000000117759", "CLIPScore": 0.87939453125, "foil": false}, {"image_id": "COCO_val2014_000000083573", "CLIPScore": 0.82568359375, "foil": true}, {"image_id": "COCO_val2014_000000413043", "CLIPScore": 0.88623046875, "foil": false}, {"image_id": "COCO_val2014_000000437564", "CLIPScore": 0.74462890625, "foil": true}, {"image_id": "COCO_val2014_000000490366", "CLIPScore": 0.8154296875, "foil": false}, {"image_id": "COCO_val2014_000000007207", "CLIPScore": 0.8251953125, "foil": true}, {"image_id": "COCO_val2014_000000455044", "CLIPScore": 0.88134765625, "foil": false}, {"image_id": "COCO_val2014_000000475043", "CLIPScore": 0.72802734375, "foil": false}, {"image_id": "COCO_val2014_000000041369", "CLIPScore": 0.71484375, "foil": true}, {"image_id": "COCO_val2014_000000255149", "CLIPScore": 0.8427734375, "foil": true}, {"image_id": "COCO_val2014_000000066046", "CLIPScore": 0.98046875, "foil": true}, {"image_id": "COCO_val2014_000000184613", "CLIPScore": 0.869140625, "foil": false}, {"image_id": "COCO_val2014_000000489550", "CLIPScore": 0.77197265625, "foil": false}, {"image_id": "COCO_val2014_000000309571", "CLIPScore": 0.6962890625, "foil": true}, {"image_id": "COCO_val2014_000000516026", "CLIPScore": 0.90234375, "foil": true}, {"image_id": "COCO_val2014_000000029444", "CLIPScore": 0.80029296875, "foil": true}, {"image_id": "COCO_val2014_000000019306", "CLIPScore": 0.88623046875, "foil": true}, {"image_id": "COCO_val2014_000000511236", "CLIPScore": 0.9072265625, "foil": false}, {"image_id": "COCO_val2014_000000056302", "CLIPScore": 0.751953125, "foil": true}, {"image_id": "COCO_val2014_000000512416", "CLIPScore": 0.8681640625, "foil": true}, {"image_id": "COCO_val2014_000000258905", "CLIPScore": 0.6796875, "foil": false}, {"image_id": "COCO_val2014_000000073622", "CLIPScore": 0.6064453125, "foil": false}, {"image_id": "COCO_val2014_000000469030", "CLIPScore": 0.80810546875, "foil": false}, {"image_id": "COCO_val2014_000000115069", "CLIPScore": 0.865234375, "foil": false}, {"image_id": "COCO_val2014_000000419560", "CLIPScore": 0.837890625, "foil": false}, {"image_id": "COCO_val2014_000000354744", "CLIPScore": 0.82080078125, "foil": false}, {"image_id": "COCO_val2014_000000378244", "CLIPScore": 0.81640625, "foil": false}, {"image_id": "COCO_val2014_000000527022", "CLIPScore": 0.69384765625, "foil": true}, {"image_id": "COCO_val2014_000000016161", "CLIPScore": 0.865234375, "foil": false}, {"image_id": "COCO_val2014_000000569030", "CLIPScore": 0.837890625, "foil": true}, {"image_id": "COCO_val2014_000000187852", "CLIPScore": 0.8349609375, "foil": false}, {"image_id": "COCO_val2014_000000100624", "CLIPScore": 0.9306640625, "foil": false}, {"image_id": "COCO_val2014_000000092771", "CLIPScore": 0.84765625, "foil": false}, {"image_id": "COCO_val2014_000000425870", "CLIPScore": 0.689453125, "foil": true}, {"image_id": "COCO_val2014_000000268229", "CLIPScore": 0.77392578125, "foil": true}, {"image_id": "COCO_val2014_000000233848", "CLIPScore": 0.779296875, "foil": false}, {"image_id": "COCO_val2014_000000011760", "CLIPScore": 0.89306640625, "foil": false}, {"image_id": "COCO_val2014_000000249227", "CLIPScore": 0.73828125, "foil": false}, {"image_id": "COCO_val2014_000000046345", "CLIPScore": 0.759765625, "foil": false}, {"image_id": "COCO_val2014_000000033697", "CLIPScore": 0.8203125, "foil": false}, {"image_id": "COCO_val2014_000000097659", "CLIPScore": 0.87939453125, "foil": false}, {"image_id": "COCO_val2014_000000257137", "CLIPScore": 0.9443359375, "foil": false}, {"image_id": "COCO_val2014_000000413287", "CLIPScore": 0.97119140625, "foil": false}, {"image_id": "COCO_val2014_000000477750", "CLIPScore": 0.72802734375, "foil": false}, {"image_id": "COCO_val2014_000000550432", "CLIPScore": 0.71826171875, "foil": false}, {"image_id": "COCO_val2014_000000486905", "CLIPScore": 0.96484375, "foil": false}, {"image_id": "COCO_val2014_000000352789", "CLIPScore": 0.849609375, "foil": false}, {"image_id": "COCO_val2014_000000172649", "CLIPScore": 0.7734375, "foil": false}, {"image_id": "COCO_val2014_000000101828", "CLIPScore": 0.85546875, "foil": false}, {"image_id": "COCO_val2014_000000172553", "CLIPScore": 0.72314453125, "foil": true}, {"image_id": "COCO_val2014_000000139917", "CLIPScore": 0.779296875, "foil": true}, {"image_id": "COCO_val2014_000000395364", "CLIPScore": 0.6875, "foil": true}, {"image_id": "COCO_val2014_000000351133", "CLIPScore": 0.90380859375, "foil": false}, {"image_id": "COCO_val2014_000000548500", "CLIPScore": 0.8349609375, "foil": true}, {"image_id": "COCO_val2014_000000372070", "CLIPScore": 0.68310546875, "foil": false}, {"image_id": "COCO_val2014_000000360772", "CLIPScore": 0.703125, "foil": false}, {"image_id": "COCO_val2014_000000024144", "CLIPScore": 0.87939453125, "foil": false}, {"image_id": "COCO_val2014_000000083573", "CLIPScore": 0.828125, "foil": false}, {"image_id": "COCO_val2014_000000318645", "CLIPScore": 0.7724609375, "foil": true}, {"image_id": "COCO_val2014_000000350668", "CLIPScore": 0.79150390625, "foil": true}, {"image_id": "COCO_val2014_000000340559", "CLIPScore": 1.013671875, "foil": true}, {"image_id": "COCO_val2014_000000081782", "CLIPScore": 0.822265625, "foil": true}, {"image_id": "COCO_val2014_000000296404", "CLIPScore": 0.8125, "foil": false}, {"image_id": "COCO_val2014_000000220732", "CLIPScore": 0.74169921875, "foil": true}, {"image_id": "COCO_val2014_000000569415", "CLIPScore": 0.8408203125, "foil": false}, {"image_id": "COCO_val2014_000000117563", "CLIPScore": 0.7109375, "foil": false}, {"image_id": "COCO_val2014_000000125208", "CLIPScore": 0.85009765625, "foil": true}, {"image_id": "COCO_val2014_000000030012", "CLIPScore": 0.6943359375, "foil": false}, {"image_id": "COCO_val2014_000000395463", "CLIPScore": 0.66455078125, "foil": false}, {"image_id": "COCO_val2014_000000389316", "CLIPScore": 0.7255859375, "foil": true}, {"image_id": "COCO_val2014_000000255769", "CLIPScore": 0.720703125, "foil": false}, {"image_id": "COCO_val2014_000000031748", "CLIPScore": 0.76611328125, "foil": false}, {"image_id": "COCO_val2014_000000297374", "CLIPScore": 0.82080078125, "foil": false}, {"image_id": "COCO_val2014_000000310902", "CLIPScore": 0.7861328125, "foil": false}, {"image_id": "COCO_val2014_000000014248", "CLIPScore": 0.8359375, "foil": false}, {"image_id": "COCO_val2014_000000444491", "CLIPScore": 0.7939453125, "foil": true}, {"image_id": "COCO_val2014_000000474465", "CLIPScore": 0.88134765625, "foil": false}, {"image_id": "COCO_val2014_000000049682", "CLIPScore": 0.80859375, "foil": false}, {"image_id": "COCO_val2014_000000139917", "CLIPScore": 0.8330078125, "foil": false}, {"image_id": "COCO_val2014_000000495376", "CLIPScore": 0.7451171875, "foil": false}, {"image_id": "COCO_val2014_000000559277", "CLIPScore": 0.85400390625, "foil": false}, {"image_id": "COCO_val2014_000000360182", "CLIPScore": 0.84521484375, "foil": true}, {"image_id": "COCO_val2014_000000120860", "CLIPScore": 0.87109375, "foil": true}, {"image_id": "COCO_val2014_000000226592", "CLIPScore": 0.89306640625, "foil": false}, {"image_id": "COCO_val2014_000000233005", "CLIPScore": 0.67578125, "foil": true}, {"image_id": "COCO_val2014_000000468736", "CLIPScore": 0.86376953125, "foil": false}, {"image_id": "COCO_val2014_000000034869", "CLIPScore": 0.76708984375, "foil": false}, {"image_id": "COCO_val2014_000000179045", "CLIPScore": 0.65185546875, "foil": false}, {"image_id": "COCO_val2014_000000136846", "CLIPScore": 0.7421875, "foil": false}, {"image_id": "COCO_val2014_000000189213", "CLIPScore": 0.78515625, "foil": false}, {"image_id": "COCO_val2014_000000435358", "CLIPScore": 0.76171875, "foil": false}, {"image_id": "COCO_val2014_000000207056", "CLIPScore": 0.921875, "foil": false}, {"image_id": "COCO_val2014_000000276146", "CLIPScore": 0.833984375, "foil": false}, {"image_id": "COCO_val2014_000000251627", "CLIPScore": 0.7109375, "foil": true}, {"image_id": "COCO_val2014_000000332113", "CLIPScore": 0.791015625, "foil": true}, {"image_id": "COCO_val2014_000000560993", "CLIPScore": 0.921875, "foil": false}, {"image_id": "COCO_val2014_000000217827", "CLIPScore": 0.76953125, "foil": true}, {"image_id": "COCO_val2014_000000186009", "CLIPScore": 0.892578125, "foil": false}, {"image_id": "COCO_val2014_000000327436", "CLIPScore": 0.7158203125, "foil": false}, {"image_id": "COCO_val2014_000000419309", "CLIPScore": 0.5986328125, "foil": false}, {"image_id": "COCO_val2014_000000518914", "CLIPScore": 0.78173828125, "foil": false}, {"image_id": "COCO_val2014_000000226097", "CLIPScore": 0.787109375, "foil": true}, {"image_id": "COCO_val2014_000000004108", "CLIPScore": 0.62548828125, "foil": true}, {"image_id": "COCO_val2014_000000282150", "CLIPScore": 0.8818359375, "foil": false}, {"image_id": "COCO_val2014_000000149197", "CLIPScore": 0.83056640625, "foil": false}, {"image_id": "COCO_val2014_000000232654", "CLIPScore": 0.80078125, "foil": true}, {"image_id": "COCO_val2014_000000147173", "CLIPScore": 0.8095703125, "foil": true}, {"image_id": "COCO_val2014_000000211743", "CLIPScore": 0.78662109375, "foil": true}, {"image_id": "COCO_val2014_000000455610", "CLIPScore": 0.6787109375, "foil": false}, {"image_id": "COCO_val2014_000000358642", "CLIPScore": 0.77587890625, "foil": true}, {"image_id": "COCO_val2014_000000218470", "CLIPScore": 0.7177734375, "foil": false}, {"image_id": "COCO_val2014_000000157767", "CLIPScore": 0.62451171875, "foil": true}, {"image_id": "COCO_val2014_000000234676", "CLIPScore": 0.75634765625, "foil": false}, {"image_id": "COCO_val2014_000000239355", "CLIPScore": 0.74267578125, "foil": true}, {"image_id": "COCO_val2014_000000327918", "CLIPScore": 0.6962890625, "foil": false}, {"image_id": "COCO_val2014_000000044621", "CLIPScore": 0.7509765625, "foil": true}, {"image_id": "COCO_val2014_000000017655", "CLIPScore": 0.8203125, "foil": false}, {"image_id": "COCO_val2014_000000005124", "CLIPScore": 0.8115234375, "foil": false}, {"image_id": "COCO_val2014_000000029573", "CLIPScore": 0.76904296875, "foil": false}, {"image_id": "COCO_val2014_000000258402", "CLIPScore": 0.8046875, "foil": true}, {"image_id": "COCO_val2014_000000056288", "CLIPScore": 0.93701171875, "foil": false}, {"image_id": "COCO_val2014_000000273825", "CLIPScore": 0.7919921875, "foil": false}, {"image_id": "COCO_val2014_000000076619", "CLIPScore": 0.73291015625, "foil": true}, {"image_id": "COCO_val2014_000000532481", "CLIPScore": 0.81298828125, "foil": false}, {"image_id": "COCO_val2014_000000509867", "CLIPScore": 0.71533203125, "foil": true}, {"image_id": "COCO_val2014_000000255338", "CLIPScore": 0.83984375, "foil": false}, {"image_id": "COCO_val2014_000000125850", "CLIPScore": 0.73779296875, "foil": true}, {"image_id": "COCO_val2014_000000131593", "CLIPScore": 0.7548828125, "foil": true}, {"image_id": "COCO_val2014_000000564629", "CLIPScore": 0.7822265625, "foil": false}, {"image_id": "COCO_val2014_000000268092", "CLIPScore": 0.91064453125, "foil": true}, {"image_id": "COCO_val2014_000000441468", "CLIPScore": 0.771484375, "foil": true}, {"image_id": "COCO_val2014_000000548957", "CLIPScore": 0.6513671875, "foil": false}, {"image_id": "COCO_val2014_000000203878", "CLIPScore": 0.75634765625, "foil": false}, {"image_id": "COCO_val2014_000000423256", "CLIPScore": 0.7900390625, "foil": true}, {"image_id": "COCO_val2014_000000519094", "CLIPScore": 0.8681640625, "foil": false}, {"image_id": "COCO_val2014_000000061773", "CLIPScore": 0.67578125, "foil": false}, {"image_id": "COCO_val2014_000000466787", "CLIPScore": 0.8056640625, "foil": false}, {"image_id": "COCO_val2014_000000337533", "CLIPScore": 0.87939453125, "foil": false}, {"image_id": "COCO_val2014_000000412586", "CLIPScore": 0.8515625, "foil": false}, {"image_id": "COCO_val2014_000000293071", "CLIPScore": 0.94140625, "foil": false}, {"image_id": "COCO_val2014_000000304305", "CLIPScore": 0.818359375, "foil": false}, {"image_id": "COCO_val2014_000000483893", "CLIPScore": 0.76953125, "foil": true}, {"image_id": "COCO_val2014_000000399164", "CLIPScore": 0.77099609375, "foil": true}, {"image_id": "COCO_val2014_000000435309", "CLIPScore": 0.69580078125, "foil": false}, {"image_id": "COCO_val2014_000000142000", "CLIPScore": 0.779296875, "foil": true}]```
DavidMChan commented 4 months ago

Ah, it does look like it may not be entirely deterministic -- I can't remember if I ran these experiments initially or one of the other team members (@spetryk or Anish) ran the clip-score experiments since it was a benchmark method (and not our ALOHa method). I've attached my environment.yml file from Conda, but we didn't pin the versions between team members, so there's no guarantee that this is the exact conda version set.

I've also attached the full set of archived results I have on HAT (which I think are the ones we used in the paper, but @spetryk compiled the final results so I'm not absolutely certain).

environment.yml clipscore.json

long8v commented 4 months ago

Thank you for detailed response! 1) With your json, I can reproduce score reported in paper with sklearn, so it is not reason from metric but CLIPScore itself

import json
with open("../data/clipscore.json", "r") as f:
    output = json.load(f)

clips = []
foils = []
for sample in output:
    clips.append(-sample["CLIPScore"]["CLIPScore"])
    foils.append(int(sample["contains_hallucination"]))

from sklearn.metrics import average_precision_score
average_precision_score(foils, clips) 

0.400964714203247

2) I found some sample is significantly different(0.16) while dependancy shows at most 0.03 difference.

Is there any possibility json you provided is result of RN50x64 or HAT dataset has changed?

3) I checked my environment with CLIPScore repo, and show exactly same value

> python clipscore.py example/good_captions.json example/images/
...
CLIPScore: 0.8584

Also, I checked my environment with another repo which reports CLIPScore, and its value is exactly same with my env. I check your docker yaml, but cannot find significant different packages (specifically torch, torchvision, Pillow)..

4) Lastly, can you assure that prefix A photo depicts used for outputting result? I found without prefix it shows closer result with one reported.

DavidMChan commented 4 months ago

It's good that you're able to reproduce the summarized results in the paper with our outputs.

We used the code in our repo for computing CLIPscore -- so if it's not present in our repo, then we didn't use it in the reproduction of the numbers. This likely means that we (likely) used the RN50x64 model without the prefix (indeed, I wasn't even aware that a forced prefix was a part of the original codebase -- and is probably at least part of the cause of the discrepancy, since many of the sentences in HAT already have a similar prefix, which could lead to influent sentences if an unwarranted prefix is added).

Edit: I looked a bit closer at the code in our repo, and it looks like the prefix is still present (we drew our base code from the original repo) in a default argument. You can run our code with the evaluator here: https://github.com/DavidMChan/aloha/blob/e38d69e0004a044254cef2641985c7ae4e01efd4/src/aloha/metrics/clipscore.py#L210

I wonder - can you reproduce the CLIP scores with our version of the code? You can do so by running:

aloha evaluate_dataset -m clipscore path/to/dataset.json
long8v commented 4 months ago

To run aloha cmd, there were some issuses 1) Maybe only for my env, but SentenceTransformer Trainer does not correspond with other dependancy, so I should have comment out all packages which import transformers.trainer in SentenceTransformer package. 2) cmd should be fix for aloha evaluate-dataset not evaluate_dataset

Try 'aloha --help' for help.

Error: No such command 'evaluate_dataset'.
Usage: aloha [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  evaluate-dataset

3) CLIPScoreMetrics does not have evaluate_dataset method, so I was not able to run in this way.

    return __callback(*args, **kwargs)
  File "/home/nsml/.local/lib/python3.8/site-packages/aloha/dataset.py", line 93, in evaluate_dataset
    _mf = _mf()
TypeError: Can't instantiate abstract class CLIPScoreMetrics with abstract methods evaluate_dataset

4) When I try to import CLIPScoreMetrics class in python, I was not able to import class with same error

>>> from aloha.metrics import ALOHa, CLIPScoreMetrics
2024-07-17 01:53:02.233563: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/nsml/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
>>> evaluator = CLIPScoreMetrics()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class CLIPScoreMetrics with abstract methods evaluate_dataset

CLIPScore is based on ViT-B/32, so I think it should be fixed if reported score is based on RN50x64. In my environment, it scores 38.97, which is lower than how it reported, so it should not be a big issue. It would be greatly helpful to fix this repo to make able to evaluate CLIPScore, and can get result in your environment.

DavidMChan commented 4 months ago

Thanks for the heads up on this! I'll fix these things in the repo (hopefully before early next week, and circle back when the commits are made). I recall we ran several variants of CLIPScore in order to get the best possible CLIPScore results - so that's likely the reason that RN50x64 was used instead. We can update the repo to indicate this.

long8v commented 4 months ago

Thanks a lot for your support! It would be helpful to community if you both report RN50x64 and ViT-B/32. Look forward to hear you back.

DavidMChan commented 3 months ago

I just pushed a bug fix commit here: https://github.com/DavidMChan/aloha/commit/7da6b90ebe392228ea532a84d78eab830c2b3cb2

Can you please try this, and see if you're still getting divergent numbers?

long8v commented 3 months ago

With your revision version, the result corresponds with my result (38.97!)

image

So the result divergence was from backbone (ViT-B/32 vsRN50x64) It would be helpful to community to add footnote on your paper that the result is backbone RN50. Otherwise, people would think it as ViT-B/32 variant. Thank you so much!

DavidMChan commented 3 months ago

Hmm, interesting! Thanks for pointing out this discrepancy, it's interesting to see that CLIPScore is even worse than expected. I'll ping @spetryk to update.