Annotate photos using the Google Cloud Vision API

simonw commented 4 years ago

It can detect faces, run OCR, do image labeling (it knows what a lemur is!) and do object localization where it identifies objects and returns bounding polygons for them.

simonw commented 4 years ago

Pricing is pretty good: free for first 1,000 calls per month, then $1.50 per thousand after that.

simonw commented 4 years ago

Python library docs: https://googleapis.dev/python/vision/latest/index.html

I'm creating a new project for this called simonwillison-photos: https://console.cloud.google.com/projectcreate

https://console.cloud.google.com/home/dashboard?project=simonwillison-photos

Then I enabled the Vision API. The direct link to https://console.cloud.google.com/flows/enableapi?apiid=vision-json.googleapis.com which they provided in the docs didn't work - it gave me a "You don't have sufficient permissions to use the requested API" error - but starting at the "Enable APIs" page and searching for it worked fine.

I created a new service account as an "owner" of that project: https://console.cloud.google.com/apis/credentials/serviceaccountkey (and complained about it on Twitter and through their feedback form)

pip install google-cloud-vision

from google.cloud import vision
client = vision.ImageAnnotatorClient.from_service_account_file("simonwillison-photos-18c570b301fe.json")
# Photo of a lemur
response = client.annotate_image(
    {
        "image": {
            "source": {
                "image_uri": "https://photos.simonwillison.net/i/1b3414ee9ade67ce04ade9042e6d4b433d1e523c9a16af17f490e2c0a619755b.jpeg"
            }
        },
        "features": [
            {"type": vision.enums.Feature.Type.IMAGE_PROPERTIES},
            {"type": vision.enums.Feature.Type.OBJECT_LOCALIZATION},
            {"type": vision.enums.Feature.Type.LABEL_DETECTION},
        ],
    }
)
response

Output is:

label_annotations {
  mid: "/m/09686"
  description: "Vertebrate"
  score: 0.9851104021072388
  topicality: 0.9851104021072388
}
label_annotations {
  mid: "/m/04rky"
  description: "Mammal"
  score: 0.975814163684845
  topicality: 0.975814163684845
}
label_annotations {
  mid: "/m/01280g"
  description: "Wildlife"
  score: 0.8973650336265564
  topicality: 0.8973650336265564
}
label_annotations {
  mid: "/m/02f9pk"
  description: "Lemur"
  score: 0.8270352482795715
  topicality: 0.8270352482795715
}
label_annotations {
  mid: "/m/0fbf1m"
  description: "Terrestrial animal"
  score: 0.7443860769271851
  topicality: 0.7443860769271851
}
label_annotations {
  mid: "/m/06z_nw"
  description: "Tail"
  score: 0.6934166550636292
  topicality: 0.6934166550636292
}
label_annotations {
  mid: "/m/0b5gs"
  description: "Branch"
  score: 0.6203985214233398
  topicality: 0.6203985214233398
}
label_annotations {
  mid: "/m/05s2s"
  description: "Plant"
  score: 0.585474967956543
  topicality: 0.585474967956543
}
label_annotations {
  mid: "/m/089v3"
  description: "Zoo"
  score: 0.5488107800483704
  topicality: 0.5488107800483704
}
label_annotations {
  mid: "/m/02tcwp"
  description: "Trunk"
  score: 0.5200017690658569
  topicality: 0.5200017690658569
}
image_properties_annotation {
  dominant_colors {
    colors {
      color {
        red: 172.0
        green: 146.0
        blue: 116.0
      }
      score: 0.24523821473121643
      pixel_fraction: 0.027533333748579025
    }
    colors {
      color {
        red: 54.0
        green: 50.0
        blue: 42.0
      }
      score: 0.10449723154306412
      pixel_fraction: 0.12893334031105042
    }
    colors {
      color {
        red: 141.0
        green: 121.0
        blue: 97.0
      }
      score: 0.1391485631465912
      pixel_fraction: 0.039133332669734955
    }
    colors {
      color {
        red: 28.0
        green: 25.0
        blue: 20.0
      }
      score: 0.08589499443769455
      pixel_fraction: 0.11506666988134384
    }
    colors {
      color {
        red: 87.0
        green: 82.0
        blue: 74.0
      }
      score: 0.0845794677734375
      pixel_fraction: 0.16113333404064178
    }
    colors {
      color {
        red: 121.0
        green: 117.0
        blue: 108.0
      }
      score: 0.05901569500565529
      pixel_fraction: 0.13379999995231628
    }
    colors {
      color {
        red: 94.0
        green: 83.0
        blue: 66.0
      }
      score: 0.049011144787073135
      pixel_fraction: 0.03946666792035103
    }
    colors {
      color {
        red: 155.0
        green: 117.0
        blue: 90.0
      }
      score: 0.04164913296699524
      pixel_fraction: 0.0023333332501351833
    }
    colors {
      color {
        red: 178.0
        green: 143.0
        blue: 102.0
      }
      score: 0.02993861958384514
      pixel_fraction: 0.0012666666880249977
    }
    colors {
      color {
        red: 61.0
        green: 51.0
        blue: 35.0
      }
      score: 0.027391711249947548
      pixel_fraction: 0.01953333243727684
    }
  }
}
crop_hints_annotation {
  crop_hints {
    bounding_poly {
      vertices {
        x: 2073
      }
      vertices {
        x: 4008
      }
      vertices {
        x: 4008
        y: 3455
      }
      vertices {
        x: 2073
        y: 3455
      }
    }
    confidence: 0.65625
    importance_fraction: 0.746666669845581
  }
}
localized_object_annotations {
  mid: "/m/0jbk"
  name: "Animal"
  score: 0.7008256912231445
  bounding_poly {
    normalized_vertices {
      x: 0.0390297956764698
      y: 0.26235100626945496
    }
    normalized_vertices {
      x: 0.8466796875
      y: 0.26235100626945496
    }
    normalized_vertices {
      x: 0.8466796875
      y: 0.9386426210403442
    }
    normalized_vertices {
      x: 0.0390297956764698
      y: 0.9386426210403442
    }
  }
}

simonw commented 4 years ago

For face detection:

    {"type": vision.enums.Feature.Type.Type.FACE_DETECTION}

For OCR:

    {"type": vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION}

simonw commented 4 years ago

Database schema for this will require some thought. Just dumping the output into a JSON column isn't going to be flexible enough - I want to be able to FTS against labels and OCR text, and potentially query against other characteristics too.

simonw commented 4 years ago

The default timeout is a bit aggressive and sometimes failed for me if my resizing proxy took too long to fetch and resize the image.

client.annotate_image(..., timeout=3.0) may be worth trying.

dogsheep / dogsheep-photos

Annotate photos using the Google Cloud Vision API #14