Closed saichaitanyamolabanti closed 2 years ago
@kevinwang @variablenix @ijoseph @prasad-kamat please help
The current algorithm is optimized to explain one row at a time: https://github.com/Affirm/shparkley/blob/9e1c72cad8f1b4a46a0f375fcd6144020dedc4e4/affirm/model_interpretation/shparkley/spark_shapley.py#L46
, so the best way to execute your above API requirements at this point would be serially
shapley_values_shparkley = []
for query_row in query_rows:
shapley_values_shparkley.append(
compute_shapley_for_sample(
df=train_spark_df,
model=model_with_shparkley_interface,
row_to_investigate=query_row,
)
)
okay thanks @ijoseph
As observed in the simple.ipynb file, Shparkley package has generated shap values for a single datapoint, so I wanted to check whether If we input several rows to be investigated, does shparkley provides shap values for all rows?
current: query_row = Row(fico=600, loan_amount=300, number_of_delinquencies=1, repaid_all_previous_affirm_loans=0) shapley_values_shparkley = compute_shapley_for_sample( df=train_spark_df, model=model_with_shparkley_interface, row_to_investigate=query_row, )
Expected: query_rows = Row(fico=600, loan_amount=300, number_of_delinquencies=1, repaid_all_previous_affirm_loans=0); Row(fico=700, loan_amount=350, number_of_delinquencies=0, repaid_all_previous_affirm_loans=0); Row(fico=680, loan_amount=370, number_of_delinquencies=1, repaid_all_previous_affirm_loans=1); shapley_values_shparkley = compute_shapley_for_sample( df=train_spark_df, model=model_with_shparkley_interface, row_to_investigate=query_rows, )