ddneves / awesome-gcp-certifications

A curated list of resources for learning about Google Cloud Platform certifications and how to prepare for it.
Other
1.13k stars 391 forks source link

Exam preparation #21

Closed JoseRFJuniorLLMs closed 4 years ago

JoseRFJuniorLLMs commented 5 years ago

Exam preparation While hands-on experience is invaluable, sometimes you miss on a bigger picture of the available infrastructure when you find yourself working only with a subset of available GCP network technologies. Here is what I’ve used:

David das Neves: Awesome GCP Certifications Coursera: Data Engineering on Google Cloud Platform Specialization - not a quick course, but covers material well, gives an opportunity to practice what you’ve learned in QuickLabs. LinuxAcademy: Google Cloud Certified Professional Data Engineer Qwiklabs: Data Engineering YouTube: Introduction to Google Cloud Machine Learning (Google Cloud Next ‘17) YouTube: Auto-awesome: advanced data science on Google Cloud Platform (Google Cloud Next ‘17) YouTube: Lifecycle of a machine learning model (Google Cloud Next ‘17) An actual exam Two hours, 50 questions. It took me 1 hour for the first pass; I had 21 out of 50 questions marked for review (shows the amount of self-doubt this exam will inflict upon you). I’ve finished the entire exam in 1 hour 15 minutes and was presented with much doubted “pass”.

Preparation suggestions I do not intend to share any of the actual questions as this is against certification’s mission. Topics I’ve covered before are the ones that I’ve found harder or less prepared for after taking all of the training above.

Key topics: BigQuery, BigTable, Dataflow, PubSub

BigQuery Streaming data into BigQuery - know it well. High-rate streaming Serving large datasets to BI dashboard (focus on data freshness and cost efficiently) Benefits of partitions From the point of view of BigQuery administrator ensure that you know best practices on how to allow various teams access team specific datasets without cross access. Methods to increase the number of concurrent slots How to verify that ETL migrated to BigQuery produced equal results Point in time snapshots BigQuery Integration with BigQuery ML UDFs Understand the pros and cons of denormalised data in the context of BigQuery BigTable Understand architecture and key reasons for high performance well Know Key Visualiser Know when to scale BigTable Know performant key/schema design Scaling up BigTable If you need to double your reads for a prolonged period, what can you do to guarantee the same read latency? Dev to Prod cluster promotion HDD to SSD data migration Dataflow Understand Apache Beam building blocks - Pipeline, PCollection, PTransform, ParDO Know Side Inputs Exactly once processing of PubSub messages Handling invalid inputs PubSub Migrate from Kafka to PubSub Know potential reasons for PubSub ingesting applications being busier than initially planned What PubSub metrics are available in Stackdriver and how to debug producers/consumers Ordering messages Dealing with duplicate messages Data migrations Know when to use Data Transfer Appliance. Hint - slow network, huge dataset, no in-between refreshes. When to use Transfer Service and what are its limitations. Know the cost of storage and availability for various products: BigQuery, BigTable, Cloud SQL, GCS to be able to find the cheapest product for a set of availability/durability criteria. How Dedicated Interconnect impacts your data transfer decisions? How to continuously sync data between on-prem and GCP Dataproc Cloud Storage connector Preemptible workers Scaling clusters ML Know best practices for training ML models (training, test, overfitting detection) Speeding up TensorFlow applications Know Cloud Data Loss Prevention API Know Cloud Natural Language API Know Cloud Vision API IAM How to allow cross team data access to BigQuery and GCS in a large organisation Misc Know how to backup, migrate Datastore Know Cloud Composer well

ddneves commented 5 years ago

Thanks for the suggestion - I will have a dedicated read. :)