apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.81k stars 4.23k forks source link

[Feature Request]: Allow users to modify _BeamArgumentParser behavior #28430

Open leotrs opened 1 year ago

leotrs commented 1 year ago

What would you like to happen?

Each instance of PipelineOptions contains a member called parser, which in turn is an instance of _BeamArgumentParser (which is a subclass of stdlib's ArgumentParser). Currently there is no way to affect the initialization of parser, as its class is hard-coded.

For example, I wanted to change the way that parser resolves conflicting CLI options. As per official docs, this is done by supplying the argument conflict_handler=... to the ArgumentParser constructor. Since there is no way for the user to affect the initialization of parser, this cannot currently be done. Changing conflict_handler after initialization, e.g. by doing options.parser.conflict_handler = ..., does not work since PipelineOptions objects may create other parser objects down the road, and CLI options are generally processed multiple times, so there is no guarantee that one is changing the conflict_handler before any options are processed.

A simple solution would be to abstract out the class of parser:

class PipelineOptions(...):

    parserclass = _BeamArgumentParser

    def __init__(self, ...):
        parser = self.parserclass()
        ...

In this way, the user may subclass _BeamArgumentParser in order to affect any initialization behavior:

class MyAwesomeArgumentParser(_BeamArgumentParser):
    def __init__(self, *args, **kwargs):
        super().__init__(conflict_handler="resolve", *args, **kwargs)

PipelineOptions.parserclass = MyAwesomeArgumentParser

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

leotrs commented 1 year ago

.take-issue