GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.14k stars 950 forks source link

Support float32 type in datastream #1724

Closed arawind closed 1 month ago

arawind commented 2 months ago

This is a patch of https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/1375.

Successfully ran the dataflow template:

MySQL source, stored to GCS via datastream: Screenshot from 2024-07-10 01-12-34

Spanner destination from datastream: Screenshot from 2024-07-10 01-12-52

export PROJECT=span-cloud-testing
export IMAGE_NAME=datastream-to-spanner
export BUCKET_NAME=gs://shubhamswe-test
export TARGET_GCR_IMAGE=gcr.io/${PROJECT}/templates/${IMAGE_NAME}
export BASE_CONTAINER_IMAGE=gcr.io/dataflow-templates-base/java11-template-launcher-base
export BASE_CONTAINER_IMAGE_VERSION=latest
export APP_ROOT=/template/${IMAGE_NAME}
export DATAFLOW_JAVA_COMMAND_SPEC=${APP_ROOT}/resources/${IMAGE_NAME}-command-spec.json
#export SPANNER_HOST="https://staging-wrenchworks.sandbox.googleapis.com/"
export INSTANCE_ID="arawind-test"
export DATABASE_ID="float32-test"
export GCS_LOCATION="gs://shubhamswe-test/float32-mysql-out"
export STREAM_NAME="projects/span-cloud-testing/locations/us-central1/streams/arawind-mysql-datastream"
export TEMPLATE_IMAGE_SPEC="gs://shubhamswe-test/images/2024_07_09_01/flex/Cloud_Datastream_to_Spanner"
export JOB_NAME="${IMAGE_NAME}-`date +%Y%m%d-%H%M%S-%N`"

gcloud dataflow flex-template run ${JOB_NAME} \
        --project=${PROJECT} --region=us-central1 \
        --template-file-gcs-location=${TEMPLATE_IMAGE_SPEC} \
        --parameters instanceId=${INSTANCE_ID},databaseId=${DATABASE_ID},inputFilePattern=${GCS_LOCATION},streamName=${STREAM_NAME}
codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 48.49%. Comparing base (4067e61) to head (d89ef23).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1724 +/- ## ============================================ + Coverage 42.22% 48.49% +6.26% + Complexity 3164 991 -2173 ============================================ Files 790 326 -464 Lines 46021 17603 -28418 Branches 4924 1760 -3164 ============================================ - Hits 19434 8536 -10898 + Misses 25006 8483 -16523 + Partials 1581 584 -997 ``` | [Components](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724/components?src=pr&el=components&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | Coverage Δ | | |---|---|---| | [spanner-templates](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | `63.12% <100.00%> (-0.30%)` | :arrow_down: | | [spanner-import-export](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | `∅ <ø> (∅)` | | | [spanner-live-forward-migration](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | `74.97% <100.00%> (+0.63%)` | :arrow_up: | | [spanner-live-reverse-replication](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | `51.44% <100.00%> (+0.60%)` | :arrow_up: | | [spanner-bulk-migration](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | `83.14% <100.00%> (+0.41%)` | :arrow_up: | | [Files](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | Coverage Δ | | |---|---|---| | [...m/google/cloud/teleport/v2/spanner/ddl/Column.java](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724?src=pr&el=tree&filepath=v2%2Fspanner-common%2Fsrc%2Fmain%2Fjava%2Fcom%2Fgoogle%2Fcloud%2Fteleport%2Fv2%2Fspanner%2Fddl%2FColumn.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform#diff-djIvc3Bhbm5lci1jb21tb24vc3JjL21haW4vamF2YS9jb20vZ29vZ2xlL2Nsb3VkL3RlbGVwb3J0L3YyL3NwYW5uZXIvZGRsL0NvbHVtbi5qYXZh) | `59.14% <100.00%> (+9.46%)` | :arrow_up: | | [...ations/convertors/ChangeEventSpannerConvertor.java](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724?src=pr&el=tree&filepath=v2%2Fspanner-common%2Fsrc%2Fmain%2Fjava%2Fcom%2Fgoogle%2Fcloud%2Fteleport%2Fv2%2Fspanner%2Fmigrations%2Fconvertors%2FChangeEventSpannerConvertor.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform#diff-djIvc3Bhbm5lci1jb21tb24vc3JjL21haW4vamF2YS9jb20vZ29vZ2xlL2Nsb3VkL3RlbGVwb3J0L3YyL3NwYW5uZXIvbWlncmF0aW9ucy9jb252ZXJ0b3JzL0NoYW5nZUV2ZW50U3Bhbm5lckNvbnZlcnRvci5qYXZh) | `84.00% <100.00%> (+1.02%)` | :arrow_up: | | [...igrations/convertors/ChangeEventTypeConvertor.java](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724?src=pr&el=tree&filepath=v2%2Fspanner-common%2Fsrc%2Fmain%2Fjava%2Fcom%2Fgoogle%2Fcloud%2Fteleport%2Fv2%2Fspanner%2Fmigrations%2Fconvertors%2FChangeEventTypeConvertor.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform#diff-djIvc3Bhbm5lci1jb21tb24vc3JjL21haW4vamF2YS9jb20vZ29vZ2xlL2Nsb3VkL3RlbGVwb3J0L3YyL3NwYW5uZXIvbWlncmF0aW9ucy9jb252ZXJ0b3JzL0NoYW5nZUV2ZW50VHlwZUNvbnZlcnRvci5qYXZh) | `90.09% <100.00%> (+0.85%)` | :arrow_up: | | [...om/google/cloud/teleport/v2/spanner/type/Type.java](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724?src=pr&el=tree&filepath=v2%2Fspanner-common%2Fsrc%2Fmain%2Fjava%2Fcom%2Fgoogle%2Fcloud%2Fteleport%2Fv2%2Fspanner%2Ftype%2FType.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform#diff-djIvc3Bhbm5lci1jb21tb24vc3JjL21haW4vamF2YS9jb20vZ29vZ2xlL2Nsb3VkL3RlbGVwb3J0L3YyL3NwYW5uZXIvdHlwZS9UeXBlLmphdmE=) | `60.30% <100.00%> (+3.24%)` | :arrow_up: | ... and [480 files with indirect coverage changes](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1724/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform)
manitgupta commented 2 months ago

Can you explain what am I looking at in the screenshots above? I think the change in value of amount1 is expected? Can you link the docs here for me to read?

Is this change "normal" and "expected" - i.e will customers be okay with this change?

1.23 in MySQL becomes 1.230000002 in Spanner.

arawind commented 2 months ago

Thanks @manitgupta!

Can you explain what am I looking at in the screenshots above? I think the change in value of amount1 is expected? Can you link the docs here for me to read?

Is this change "normal" and "expected" - i.e will customers be okay with this change?

1.23 in MySQL becomes 1.230000002 in Spanner.

TL;DR: The issue is just in the Cloud Console UI, the actual value of 1.23 is still preserved. The underlying reason is because we're up-casting the float value to double while on the Cloud Spanner API, as that's the best we can do with the limited types that the API provides. Client libraries do the conversions float-to-double while sending, and double-to-float while receiving, and these conversions are lossless. Now, since Cloud Console uses a JS client, and since JS does not support a float32 type, we're doing our best to convert the float64 value in the API to float32 in JS, but it is just an approximation to the actual down casting, and we end up with these values on the UI.

The actual values are preserved across migrations, import / export etc.

Design doc is available internally at go/cloud-spanner-float32.