Open simonw opened 4 years ago
Pricing is pretty good: free for first 1,000 calls per month, then $1.50 per thousand after that.
Python library docs: https://googleapis.dev/python/vision/latest/index.html
I'm creating a new project for this called simonwillison-photos: https://console.cloud.google.com/projectcreate
https://console.cloud.google.com/home/dashboard?project=simonwillison-photos
Then I enabled the Vision API. The direct link to https://console.cloud.google.com/flows/enableapi?apiid=vision-json.googleapis.com which they provided in the docs didn't work - it gave me a "You don't have sufficient permissions to use the requested API" error - but starting at the "Enable APIs" page and searching for it worked fine.
I created a new service account as an "owner" of that project: https://console.cloud.google.com/apis/credentials/serviceaccountkey (and complained about it on Twitter and through their feedback form)
pip install google-cloud-vision
from google.cloud import vision
client = vision.ImageAnnotatorClient.from_service_account_file("simonwillison-photos-18c570b301fe.json")
# Photo of a lemur
response = client.annotate_image(
{
"image": {
"source": {
"image_uri": "https://photos.simonwillison.net/i/1b3414ee9ade67ce04ade9042e6d4b433d1e523c9a16af17f490e2c0a619755b.jpeg"
}
},
"features": [
{"type": vision.enums.Feature.Type.IMAGE_PROPERTIES},
{"type": vision.enums.Feature.Type.OBJECT_LOCALIZATION},
{"type": vision.enums.Feature.Type.LABEL_DETECTION},
],
}
)
response
Output is:
label_annotations {
mid: "/m/09686"
description: "Vertebrate"
score: 0.9851104021072388
topicality: 0.9851104021072388
}
label_annotations {
mid: "/m/04rky"
description: "Mammal"
score: 0.975814163684845
topicality: 0.975814163684845
}
label_annotations {
mid: "/m/01280g"
description: "Wildlife"
score: 0.8973650336265564
topicality: 0.8973650336265564
}
label_annotations {
mid: "/m/02f9pk"
description: "Lemur"
score: 0.8270352482795715
topicality: 0.8270352482795715
}
label_annotations {
mid: "/m/0fbf1m"
description: "Terrestrial animal"
score: 0.7443860769271851
topicality: 0.7443860769271851
}
label_annotations {
mid: "/m/06z_nw"
description: "Tail"
score: 0.6934166550636292
topicality: 0.6934166550636292
}
label_annotations {
mid: "/m/0b5gs"
description: "Branch"
score: 0.6203985214233398
topicality: 0.6203985214233398
}
label_annotations {
mid: "/m/05s2s"
description: "Plant"
score: 0.585474967956543
topicality: 0.585474967956543
}
label_annotations {
mid: "/m/089v3"
description: "Zoo"
score: 0.5488107800483704
topicality: 0.5488107800483704
}
label_annotations {
mid: "/m/02tcwp"
description: "Trunk"
score: 0.5200017690658569
topicality: 0.5200017690658569
}
image_properties_annotation {
dominant_colors {
colors {
color {
red: 172.0
green: 146.0
blue: 116.0
}
score: 0.24523821473121643
pixel_fraction: 0.027533333748579025
}
colors {
color {
red: 54.0
green: 50.0
blue: 42.0
}
score: 0.10449723154306412
pixel_fraction: 0.12893334031105042
}
colors {
color {
red: 141.0
green: 121.0
blue: 97.0
}
score: 0.1391485631465912
pixel_fraction: 0.039133332669734955
}
colors {
color {
red: 28.0
green: 25.0
blue: 20.0
}
score: 0.08589499443769455
pixel_fraction: 0.11506666988134384
}
colors {
color {
red: 87.0
green: 82.0
blue: 74.0
}
score: 0.0845794677734375
pixel_fraction: 0.16113333404064178
}
colors {
color {
red: 121.0
green: 117.0
blue: 108.0
}
score: 0.05901569500565529
pixel_fraction: 0.13379999995231628
}
colors {
color {
red: 94.0
green: 83.0
blue: 66.0
}
score: 0.049011144787073135
pixel_fraction: 0.03946666792035103
}
colors {
color {
red: 155.0
green: 117.0
blue: 90.0
}
score: 0.04164913296699524
pixel_fraction: 0.0023333332501351833
}
colors {
color {
red: 178.0
green: 143.0
blue: 102.0
}
score: 0.02993861958384514
pixel_fraction: 0.0012666666880249977
}
colors {
color {
red: 61.0
green: 51.0
blue: 35.0
}
score: 0.027391711249947548
pixel_fraction: 0.01953333243727684
}
}
}
crop_hints_annotation {
crop_hints {
bounding_poly {
vertices {
x: 2073
}
vertices {
x: 4008
}
vertices {
x: 4008
y: 3455
}
vertices {
x: 2073
y: 3455
}
}
confidence: 0.65625
importance_fraction: 0.746666669845581
}
}
localized_object_annotations {
mid: "/m/0jbk"
name: "Animal"
score: 0.7008256912231445
bounding_poly {
normalized_vertices {
x: 0.0390297956764698
y: 0.26235100626945496
}
normalized_vertices {
x: 0.8466796875
y: 0.26235100626945496
}
normalized_vertices {
x: 0.8466796875
y: 0.9386426210403442
}
normalized_vertices {
x: 0.0390297956764698
y: 0.9386426210403442
}
}
}
For face detection:
{"type": vision.enums.Feature.Type.Type.FACE_DETECTION}
For OCR:
{"type": vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION}
Database schema for this will require some thought. Just dumping the output into a JSON column isn't going to be flexible enough - I want to be able to FTS against labels and OCR text, and potentially query against other characteristics too.
The default timeout is a bit aggressive and sometimes failed for me if my resizing proxy took too long to fetch and resize the image.
client.annotate_image(..., timeout=3.0)
may be worth trying.
It can detect faces, run OCR, do image labeling (it knows what a lemur is!) and do object localization where it identifies objects and returns bounding polygons for them.