Open cobrienbeam opened 3 weeks ago
Additionally, maybe there could be a link out to a page that discusses the use of projects vs deployments. I like how in the next and react documentation that it links out to different sections to discuss potential tradeoffs of one selection vs another.
In this discussion of when to use additional projects vs additional deployments:
Company Infrastructure |ββ Production Deployment (PCI Compliant) β βββ Financial Projects β |ββ payment_processing β βββ customer_billing β βββ Standard Deployment |ββ Marketing Projects βββ Analytics Projects
Infrastructure |ββ Heavy Computing Deployment (32 CPU, 128GB RAM) β βββ ML Training Projects β |ββ model_training β βββ batch_inference β βββ Light Computing Deployment (4 CPU, 16GB RAM) βββ ETL Projects |ββ daily_reports βββ data_ingestion
Company |ββ Team A Deployment β βββ Projects with specific permissions/access β βββ Team B Deployment βββ Different security groups/access patterns
When teams need complete isolation or different access patterns.
Business Critical Deployment |ββ Revenue impacting jobs βββ Customer-facing data pipelines
Non-Critical Deployment |ββ Internal analytics βββ Experimental projects
When you have so many projects that the UI becomes slow
When job runs start queueing too much
When the deployment's database gets too large
The key question is: "Do these projects NEED to be separate?" rather than "CAN they be separate?".
Using a single deployment has the following benefits:
And then provide more information on workspaces using the definitions.py files instead of init.py:
You need to explicitly tell Dagster where to find your definitions through the workspace.yaml file:
load_from:
- python_file: marketing/definitions.py
location_name: marketing_tools
- python_file: finance/definitions.py
location_name: finance_tools
I didn't quite understand the use of the unpacking operator notation in the definition example:
The asterisk * in Python is the "unpacking operator".
# Let's say trip_assets contains these assets:
trip_assets = [taxi_trips, taxi_zones, taxi_trips_file]
# And metric_assets contains:
metric_assets = [revenue_by_day, trips_by_day]
# When you use * it "unpacks" the lists:
defs = Definitions(
assets=[*trip_assets, *metric_assets]
)
# This is equivalent to writing:
defs = Definitions(
assets=[
taxi_trips,
taxi_zones,
taxi_trips_file,
revenue_by_day,
trips_by_day
]
)
Without the *, you'd get nested lists:
# Without unpacking (WRONG):
defs = Definitions(
assets=[trip_assets, metric_assets]
)
# This would be like:
assets=[[taxi_trips, taxi_zones], [revenue_by_day]] # Nested lists!
# With unpacking (CORRECT):
defs = Definitions(
assets=[*trip_assets, *metric_assets]
)
# This correctly flattens to:
assets=[taxi_trips, taxi_zones, revenue_by_day] # Flat list!
You'll often see this pattern when you want to combine multiple lists into a single flat list.
It's like saying "take everything out of these lists and put them all together in one new list."
I wish the explanation on os.getenv and EnvVar was a little bit clearer:
With os.getenv:
With EnvVar:
It's especially useful for:
You seem to have understood the unpacking operator of python quite well, you've correctly explained how it works. (I'm just a rando, not from the Dagster team)
You seem to have understood the unpacking operator of python quite well, you've correctly explained how it works. (I'm just a rando, not from the Dagster team)
That was my proposal for the documentation in a callout or side link, etc. regarding the asterisk notation in the example.
What's the issue or suggestion?
A Definitions object is a set of Dagster definitions available and loadable by Dagster tools.
This is a circular sentence. If a definitions object is a set of Dagster definitions available then what are the Dagster definitions and what makes them available vs not available? It's totally unclear.
Additionally, the added explanation does not really help explain:
The Definitions object is used to assign definitions to a code location, and each code location can only have a single Definitions object. This object maps to one code location. With code locations, users isolate multiple Dagster projects from each other without requiring multiple deployments. Youβll learn more about code locations a bit later in this lesson.
What are code locations, and why can they have only a single Definitions object? Okay so the cardinality between Defintions objects and code locations are 1:1, but that doesn't really explain the rest of it.
Additional information
A Definitions object is like a project manifest for Dagster - it bundles together all the assets, jobs, schedules, and other components that make up a single Dagster project. It's like a menu that tells Dagster exactly what's available to run in this specific project. Each separate project (called a code location) needs its own Definitions object, and you can't have multiple Definitions objects in the same location. This setup lets you keep different Dagster projects completely separate from each other, without needing to set up multiple Dagster deployments.
Why do we need this?
Two main reasons:
Each project has its own Definitions, so they don't interfere with each other.
Message from the maintainers
Impacted by this issue? Give it a π! We factor engagement into prioritization.