Labs and demos for courses for GCP Training (http://cloud.google.com/training).
Apache License 2.0
7.65k
stars
5.77k
forks
source link
"Serverless Data Processing with Dataflow - Writing an ETL Pipeline using Apache Beam and Cloud Dataflow (Python)" job fails because of short schema #2541
When following the instructions of https://www.cloudskillsboost.google/course_sessions/11591045/labs/433174 (part of
09 Serverless Data Processing with Dataflow: Develop Pipelines
, `Data Engineer Learning Path > Serverless Data Processing with Dataflow: Develop PipelinesTask 5. Write to a sink
cites a too short schema:However if someone digs deep can see https://github.com/GoogleCloudPlatform/training-data-analyst/blob/989aa2d423f17647b20e2e02382b5d0f7b467193/quests/dataflow_python/batch_event_generator.py#L47
log_fields = ["ip", "user_id", "lat", "lng", "timestamp", "http_request", "http_response", "num_bytes", "user_agent"]
and consequently the solution file hashowever without peeking into the solution the job fails. The instructions could be updates for better student success.