This PR adds the ability to use the new student ID earthmover package to the EarthbeamDAG. It adds three optional tasks to the file_to_edfi_taskgroup:
check_existing_match_rates (pre-earthmover): Queries the match rates table in Snowflake and returns the highest computed ID match rate for the tenant, year, and assessment. If this value is too low (or none is found), matches will be recomputed during the EM run.
match_rates_to_snowflake (post-earthmover): Loads computed match rates to Snowflake
It also modifies the run_earthmover task to add the parameters required for the student ID bundle if needed.
It is currently in draft status while testing continues.
Usage
Three new and three existing task-group level args are required to use the student ID features, and one is optional:
assessment_bundle: name of the assessment bundle being run, used for selecting the correct bundle and as metadata in the match rates table
student_id_match_rates_table: Snowflake table set up for storing student ID match rates ([db].[schema].[table])
snowflake_read_conn_id: Connection ID for Snowflake creds with read access to analytics
s3_conn_id, s3_filepath, snowflake_conn_id: existing args used to send the match rates data to Snowflake
required_id_match_rate: optional, provide if overwriting the default of 0.5
Because the student ID process uses project composition, earthmover deps will need to be run to install the packages. More details on this to come in a migration guide. For a preview, see the testing instructions linked below.
Breaking Changes
The argument database_conn_id is renamed to snowflake_read_conn_id for clarity. It also creates an earthmover parameter called SNOWFLAKE_CONNECTION instead of DATABASE_CONNECTION to align with the student ID bundle.
PR Merge Priority:
High, once testing is complete
Tests and QC done:
This has been successfully tested in GSN and testing is in progress in TX. Instructions for testing can be found here.
Description & motivation
This PR adds the ability to use the new student ID earthmover package to the
EarthbeamDAG
. It adds three optional tasks to thefile_to_edfi_taskgroup
:check_existing_match_rates
(pre-earthmover): Queries the match rates table in Snowflake and returns the highest computed ID match rate for the tenant, year, and assessment. If this value is too low (or none is found), matches will be recomputed during the EM run.match_rates_to_snowflake
(post-earthmover): Loads computed match rates to SnowflakeIt also modifies the
run_earthmover
task to add the parameters required for the student ID bundle if needed. It is currently in draft status while testing continues.Usage
Three new and three existing task-group level args are required to use the student ID features, and one is optional:
assessment_bundle
: name of the assessment bundle being run, used for selecting the correct bundle and as metadata in the match rates tablestudent_id_match_rates_table
: Snowflake table set up for storing student ID match rates ([db].[schema].[table]
)snowflake_read_conn_id
: Connection ID for Snowflake creds with read access toanalytics
s3_conn_id
,s3_filepath
,snowflake_conn_id
: existing args used to send the match rates data to Snowflakerequired_id_match_rate
: optional, provide if overwriting the default of 0.5Because the student ID process uses project composition,
earthmover deps
will need to be run to install the packages. More details on this to come in a migration guide. For a preview, see the testing instructions linked below.Breaking Changes
The argument
database_conn_id
is renamed tosnowflake_read_conn_id
for clarity. It also creates an earthmover parameter calledSNOWFLAKE_CONNECTION
instead ofDATABASE_CONNECTION
to align with the student ID bundle.PR Merge Priority:
High, once testing is complete
Tests and QC done:
This has been successfully tested in GSN and testing is in progress in TX. Instructions for testing can be found here.