Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
583 stars 250 forks source link

Chapter 2 example error #66

Open jhancock1229 opened 1 year ago

jhancock1229 commented 1 year ago

In attempting to execute the code at the end of chapter 2 i get the following error:

WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: timed out
WARNING:google.auth._default:Authentication failed using Compute Engine authentication due to unavailable metadata server.
WARNING:apache_beam.internal.gcp.auth:Unable to find default credentials to use: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Connecting anonymously.

I know its in reference to attempting to pull kinglear.txt from google storage. Any tips on how to resolve this? BTW here is the source code i copied out of the book:

import re
import apache_beam as beam
from apache_beam.io import ReadFromText
from apache_beam.io import WriteToText
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions

input_file = "gs://dataflow-samples/shakespeare/kinglear.txt"
output_file = "~/coding/machine-learning/output.txt"

pipeline_options = PipelineOptions()

with beam.Pipeline(options=pipeline_options) as p:
    lines = p | ReadFromText(input_file)
    counts = (
        lines
        | 'Split' >> beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x))
        | 'PairWithOne' >> beam.Map(lambda x: (x, 1))
        | 'GroupAndSum' >> beam.CombinePerKey(sum)
    )
    def format_result(word_count):
        (word, count) = word_count
        return "{}: {}".format(word, count)

    output = counts | 'Format' >> beam.Map(format_result)

    output | WriteToText(output_file)
MicAnt64 commented 1 year ago

Hi jhancock1229,

Those are not errors, they are warnings. I get the same warnings. If I enter the following in a new cell: !head output.txt*

I get: KING: 243 LEAR: 236 DRAMATIS: 1 PERSONAE: 1 king: 65 of: 447 Britain: 2 OF: 15 FRANCE: 10 DUKE: 3