awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 299 forks source link

Disable argparse's allow_abbrev Functionality to Make getResolvedOptions Behavior Consistent #206

Open wjhbr opened 4 months ago

wjhbr commented 4 months ago

Issue # (N/A)

This change is to address a flaw within the getResolvedOptions() function. This function uses the parse_known_args function of the python repo package argparse to resolve arguments to Glue. If a Glue Job has the two arguments --metadata and --metadata_location, when a user runs this function with the option map containing just ["metadata_location"] the resulting behavior out of glue is unpredictable as the parse_known_args function here has a side effect listed in it's source. This side effect causes the wrong argument values to be passed from this function to a user.

Description of changes:

As an example: this is a sample of this behavior from my job.

args = [
    '/tmp/run.py',
    '--job-bookmark-option',
    'job-bookmark-disable',
    '--metadata_location',
    'example',
    '--JOB_ID',
    '',
    '--metadata',
    'True',
    '--JOB_RUN_ID',
    '',
    '--JOB_NAME',
    ''
] 

options = ['metadata_location']

Below is psuedo of the offending production code:

parser = GlueArgumentParser()

for option in options:
    parser.add_argument('--' + option, required=True)

parsed, extra = parser.parse_known_args(args[1:])
parsed
-> Namespace(metadata_location='True')

The solution provided by this PR disables allow_abbrev and requires a user to exactly match the string argument they're trying to extract, which seems to be the intended behavior based on documentation.

parser = GlueArgumentParser(allow_abbrev=False)

for option in options:
    parser.add_argument('--' + option, required=True)

parsed, extra = parser.parse_known_args(args[1:])
parsed
-> Namespace(metadata_location='example')

Thanks for reading.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

RusJr commented 3 months ago

I second that