datopian / assembler

The DataHub data assembly line
MIT License
10 stars 2 forks source link

Getting too many connections when running the pipelines #62

Closed zelima closed 6 years ago

zelima commented 6 years ago

RDS is getting too many connections, cause each new FlowRegistry object in Generator creates a new connection to RDS. One such object should suffice. (see analysis)

Acceptance criteria

Tasks

Analysis

current state

class Generator(GeneratorBase):

    @classmethod
    def get_schema(cls):
        return json.load(open(SCHEMA_FILE))

    @classmethod
    def generate_pipeline(cls, source):
        registry = FlowRegistry(DB_ENGINE)
        count = 0
        for pipeline in registry.list_pipelines():  # type: Pipelines
            yield pipeline.pipeline_id, pipeline.pipeline_details
            count += 1
        logging.error('assember sent %d pipelines', count)

Solution

REGISTRY = FlowRegistry(DB_ENGINE)
class Generator(GeneratorBase):

    @classmethod
    def get_schema(cls):
        return json.load(open(SCHEMA_FILE))

    @classmethod
    def generate_pipeline(cls, source):
        count = 0
        for pipeline in REGISTRY.list_pipelines():  # type: Pipelines
            yield pipeline.pipeline_id, pipeline.pipeline_details
            count += 1
        logging.error('assember sent %d pipelines', count)
zelima commented 6 years ago

FIXED