GoogleCloudPlatform / ml-design-patterns

Source code accompanying O'Reilly book: Machine Learning Design Patterns
Apache License 2.0
1.87k stars 527 forks source link

Chapter 3 : Cascade evaluate ValueError: The pyarrow library is not installed #26

Closed jeromemassot closed 3 years ago

jeromemassot commented 3 years ago

Dear authors, the evaluate component of the pipeline fails due to the lack of pyarrow module.

Solved by changing the module request in the pipeline definition :

dsl.pipeline(
    name='Cascade pipeline on SF bikeshare',
    description='Cascade pipeline on SF bikeshare'
)

def cascade_pipeline(
    project_id = PROJECT_ID
):
    ddlop = comp.func_to_container_op(run_bigquery_ddl, packages_to_install=['google-cloud-bigquery'])

    c1 = train_classification_model(ddlop, PROJECT_ID)
    c1_model_name = c1.outputs['created_table']

    c2a_input = create_training_data(ddlop, PROJECT_ID, c1_model_name, 'Typical')
    c2b_input = create_training_data(ddlop, PROJECT_ID, c1_model_name, 'Long')

    c3a_model = train_distance_model(ddlop, PROJECT_ID, c2a_input.outputs['created_table'], 'Typical')
    c3b_model = train_distance_model(ddlop, PROJECT_ID, c2b_input.outputs['created_table'], 'Long')

    evalop = comp.func_to_container_op(evaluate, packages_to_install=['google-cloud-bigquery[bqstorage,pandas]', 'pandas'])
    error = evalop(PROJECT_ID, c1_model_name, c3a_model.outputs['created_table'], c3b_model.outputs['created_table'])
    print(error.output)

Best Regards

Jerome

jeromemassot commented 3 years ago

SOLVED by changing the module import used by the evalop : evalop = comp.func_to_container_op(evaluate, packages_to_install=['google-cloud-bigquery[bqstorage,pandas]', 'pandas'])