Netflix / genie

Distributed Big Data Orchestration Service
https://netflix.github.io/genie
Apache License 2.0
1.7k stars 365 forks source link

Make the eagerly loaded job dependencies and env variables to use the default fetch types #1174

Closed enicloom closed 1 year ago

enicloom commented 1 year ago

Entity graph loads the named attributes eagerly. This creates a very big list of returned rows when there are multiple layers of OneToMany relationships exist in a single query.

For example, if a single job has 5 job dependencies and 5 job environment variables. The number of returned rows of this entity graph is 155 = 25. This number grows very fast.

This change makes the job_dependencies and job_env_variables to use the default FetchType, which is LAZY. Then in the above example, at most, the number of returned rows is 1 + 5 + 5 if both getJobDependencies() and getJobEnvironmentVariables() are called.

This diff does not make any changes to the cluster, command and application resources since those are mostly safe.

There is a cost of this. By adopting this change, the number of query/db connection needed for getJobSpecification() will be increased to 3 instead of 1. Since this call is associated with every single job launch, it worth to acknowledge the cost.

coveralls commented 1 year ago

Coverage Status

Coverage decreased (-0.007%) to 93.916% when pulling 7c2299be2556e94f115545800f985165209060ae on enicloom:lt_simplify_job_specification_graph into abe4f230fde07964a665ff1d5ad210836827499b on Netflix:master.