datarobot-community / datarobotx-idp

Other
4 stars 3 forks source link

Make calls robust against deserialised dr object function params #13

Open derluke opened 5 months ago

derluke commented 5 months ago

I saw @mpkrass7 comment on llm blueprint. I think we have the same issue here:

https://github.com/datarobot-community/datarobotx-idp/blob/569e03e2f0774b1020b3f32003000a4f07c265c1/src/datarobotx/idp/vector_databases.py#L57C35-L57C53

but we also have these issues with dr.AdvancedOptions in autopilot for example. Is it worth investing into consistency?

mpkrass7 commented 5 months ago

Linked above you're right that its the same issue. In Autopilot you might be right though I think the example is a little less clear. get_or_create_vector_database_from_dataset is designed to execute VectorDatabase.create if it doesn't find an existing vector database. Similarly get_or_create_llm_blueprint is designed to run LLMBlueprint.create if it doesn't find an existing blueprint.

get_or_create_autopilot_run is designed to both create a project and then run analyze_and_model. The reason it matters in this case (I think) is that most of our idp helpers have an expectation of basically just running the equivalent function in the Python SDK or POST request. So the inputs should comply with the expected inputs in the SDK. In get_or_create_autopilot_run there isn't an equivalent Python SDK call so no expectation of inputs should have to exist.

I defer to @brau0300 though. Implementing for all of advanced_options, datetime_partitioning, and feature_settings might be trickier but doing it correctly could be beneficial to power users of the existing Python SDK