Chapter 3 : Cascade evaluate ValueError: The pyarrow library is not installed

Dear authors, the evaluate component of the pipeline fails due to the lack of pyarrow module.

Solved by changing the module request in the pipeline definition :

dsl.pipeline(
    name='Cascade pipeline on SF bikeshare',
    description='Cascade pipeline on SF bikeshare'
)

def cascade_pipeline(
    project_id = PROJECT_ID
):
    ddlop = comp.func_to_container_op(run_bigquery_ddl, packages_to_install=['google-cloud-bigquery'])

    c1 = train_classification_model(ddlop, PROJECT_ID)
    c1_model_name = c1.outputs['created_table']

    c2a_input = create_training_data(ddlop, PROJECT_ID, c1_model_name, 'Typical')
    c2b_input = create_training_data(ddlop, PROJECT_ID, c1_model_name, 'Long')

    c3a_model = train_distance_model(ddlop, PROJECT_ID, c2a_input.outputs['created_table'], 'Typical')
    c3b_model = train_distance_model(ddlop, PROJECT_ID, c2b_input.outputs['created_table'], 'Long')

    evalop = comp.func_to_container_op(evaluate, packages_to_install=['google-cloud-bigquery[bqstorage,pandas]', 'pandas'])
    error = evalop(PROJECT_ID, c1_model_name, c3a_model.outputs['created_table'], c3b_model.outputs['created_table'])
    print(error.output)

Best Regards

Jerome

GoogleCloudPlatform / ml-design-patterns

Chapter 3 : Cascade evaluate ValueError: The pyarrow library is not installed #26