apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.81k stars 4.23k forks source link

Key only reads from Google Datastore #18423

Open kennknowles opened 2 years ago

kennknowles commented 2 years ago

Currently there is no functionality allowing to read only keys from the Google Datastore through the Datastore IO. In some cases users don't need to read the whole entity, e.g. to filter by certain values in ancestry. This seems to be an important feature as the entity reads are expensive and thus the native Datastore client/API allow to run Key Only queries.

Imported from Jira BEAM-2819. Original Jira may contain additional context. Reported by: ilnar.

sosimon commented 8 months ago

This may or may not be applicable for Java, but for the Python SDK, I believe we can specify projection = ["__key__"] to do a keys-only query. It's not really documented though.

Relevant bit of code: https://github.com/googleapis/python-datastore/blob/main/google/cloud/datastore/query.py#L463-L465