dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.72k stars 1.61k forks source link

[Feature] Filter standard library packages out of Python models' `packages` config #9875

Open gwenwindflower opened 6 months ago

gwenwindflower commented 6 months ago

Is this your first time submitting a feature request?

Describe the feature

Right now, if a user wants to use re, os, etc in a Python model, they would rightfully think it important to add it to the packages list config argument of the model. In fact, dbt will throw a 'package not found' error for packages that aren't 3rd party. The Right Way at present is to just import and use them, but we don't flag that anywhere in the docs. It would be good to filter out the standard library packages and perhaps throw a warning instead of an error here, letting people know this isn't necessary, but still proceeding.

At present you need to do this, which is not super obvious:

import pandas as pd
import numpy as np
import re

def model(dbt, session):
    # dbt configuration
    dbt.config(packages=["pandas","numpy"])

Describe alternatives you've considered

Who will this benefit?

Users of Python models.

Are you interested in contributing this feature?

No

Anything else?

dbeatty10 commented 6 months ago

Thanks for opening this @gwenwindflower !

Which adapter did you use? Could you provide a simple dbt python model that exhibits this issue?

Was it dbt-snowflake with a model like this, by any chance?

import pandas as pd
import numpy as np
import re

def model(dbt, session):
    dbt.config(packages=["pandas", "numpy", "re"])

    df = pd.DataFrame({"hello": ["world"]})
    return df

And an error like this?

00:23:57    Database Error in model my_python_model (models/my_python_model.py)
  100357 (P0000): Cannot create a Python function with the specified packages. Please check your packages specification and try again.
  compiled Code at target/run/my_project/models/my_python_model.py
gwenwindflower commented 6 months ago

hey @dbeatty10, sorry for the lack of a firsthand repro, I reported this based on a user in the Community so didn't get the error myself! @aranke suggested it could be worthwhile to just fix this rather than updating the docs, and I tend to agree, particularly with the offered idea of a clear Warning over a mysterious Error. based on my conversation with the Community-member, this looks like exactly the simplified version of the model he was creating and error he was getting that confused him. Here's a link to the thread.

dbeatty10 commented 5 months ago

@aranke could you share the details of your proposed approach for this scenario?

If you can provide links to the relevant area(s) of the source code, that would be even better.

aranke commented 5 months ago

Code: TK

Python built-in modules: https://docs.python.org/3/library/sys.html#sys.builtin_module_names