fal-ai / dbt-fal

do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.
https://fal.ai/dbt-fal
Apache License 2.0
853 stars 75 forks source link

Successfully installed catboost wants to import as _catboost #40 #915

Open rm-minus-r-star opened 10 months ago

rm-minus-r-star commented 10 months ago

Describe the bug Attempt to import catboost results in error that module _catboost cannot be found -- a leading underscore is picked up somewhere.

Your environment

How to reproduce I'm trying to use a model trained outside dbt to predict labels via python under dbt-fal

fal-project.yml:

environments:
  - name: ml
    type: venv
    requirements:
      - scipy
      - pandas
      - numpy
      - statsmodels
      - catboost

catboost was just added to this code where other models with the other libraries listed work well. The first run of the file below produced a long installation log to stdout, ending with

[builder] [info] Successfully installed [...] catboost-1.2.2 [...]

Running the python model below with dbt run select ... gives me the subsequent error

from catboost import CatBoostRegressor
from pandas import concat

def model(dbt, fal):
    dbt.config(fal_environment="ml")

    df: pandas.DataFrame = dbt.ref("tr_rep_gentrification_prediction_inputs")

    X = df\
        .drop(['col0', 'col1', 'col2'], axis=1)\
        .fillna(0.0)

    catb = CatBoostRegressor()
    catb.load_model('cb_model.cbm')

    pred = catb.predict(X)
    results = concat([df, pred], axis=0)

    return(results)

stdout:

No module named '_catboost'
22:55:01  1 of 1 ERROR creating python table model trans.tr_rep_gentrification_prediction_outputs  [ERROR in 42.02s]
22:55:02  
22:55:02  Finished running 1 table model in 0 hours 0 minutes and 58.89 seconds (58.89s).
22:55:02  
22:55:02  Completed with 1 error and 0 warnings:
22:55:02  
22:55:02  No module named '_catboost'

If I remove catboost from the fal-project.yml file, I get the same error (as expected), but the leading underscore no longer appears.

I also tried as recommended by @mederka at https://github.com/fal-ai/fal/issues/40#issuecomment-1898466134 to import within the model function instead, but I get the same error.

Expected behavior I expect catboost to be imported the same as every other library

Actual behavior model fails to run owing to _catboost not being found -- a leading underscore is being added.

Screenshots None

Additional context Also posted Here in case there's a more generally obvious solution

chamini2 commented 10 months ago

it seems that this is a library, I think this is more about how catboost installs than dbt-fal itself.

https://github.com/catboost/catboost/blob/d6172a4e4b11f485c416368461feae3f3ce98745/catboost/python-package/catboost/_catboost.pyx

rm-minus-r-star commented 10 months ago

Hmm. It installs fine outside of dbt-fal though.

CatBoostRegressor appears to be exported out of the package level init.py from core.py. I'm not familiar with why a cython script file in the same directory would interfere?

chamini2 commented 10 months ago

can you add more details around

[builder]  [info]    Successfully installed [...] catboost-1.2.2 [...]

see if we can find a hint there

rm-minus-r-star commented 10 months ago

tmperr.txt

I had a look over this too, nothing jumped out at me, but I'm not an expert.

This log ended with a silly error on my part when trying to run the python model -- after fixing the obvious, I get the errors as quoted in the bug report.

chamini2 commented 10 months ago

Can you try to build it with a conda environment instead?

environments:
  - name: ml
    type: conda
    packages:
      - scipy
      - pandas
      - numpy
      - statsmodels
      - catboost
rm-minus-r-star commented 10 months ago

(

@chamini2 noted, but seriously struggling to get conda functional. I've tried so many things. Should this be a no-brainer? Or does this actually give you info?

No matter what I try, I get

Could not find conda executable. If conda executable is not available by default, please point isolate to the path where conda binary is available 'ISOLATE_CONDA_HOME'.

)

chamini2 commented 10 months ago

You need to have conda installed to be able to use this, but I think will make your use case work.

rm-minus-r-star commented 10 months ago

You need to have conda installed to be able to use this, but I think will make your use case work.

Yeah, I installed conda, tried setting the env var to every level of the install location, and activated it in the same shell, all with no joy. Great the hear that it sounds positive for the venv type.

rm-minus-r-star commented 8 months ago

[...], but I think will make your use case work.

Any luck here?