dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.7k stars 1.48k forks source link

SQLite Error when creating Dynamic Partitions #16260

Open amstu2 opened 1 year ago

amstu2 commented 1 year ago

Dagster version

1.4.11

What's the issue?

I was attempting to create a sensor to detect unprocessed files in a S3 repository and generate the assets for them using dynamic partitions. An internal SQL error occurs when attempting to create partitions for the default boto3 MaxKeys parameter of 1000.

Traceback (most recent call last):
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/dagster/_daemon/sensor.py", line 534, in _process_tick_generator
    yield from _evaluate_sensor(
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/dagster/_daemon/sensor.py", line 698, in _evaluate_sensor
    instance.add_dynamic_partitions(
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/dagster/_utils/__init__.py", line 651, in inner
    return func(*args, **kwargs)
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/dagster/_core/instance/__init__.py", line 2031, in add_dynamic_partitions
    return self._event_storage.add_dynamic_partitions(partitions_def_name, partition_keys)
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/dagster/_core/storage/event_log/sql_event_log.py", line 1954, in add_dynamic_partitions
    existing_rows = conn.execute(
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1412, in execute
    return meth(
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1635, in _execute_clauseelement
    ret = self._execute_context(
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
    return self._exec_single_context(
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1984, in _exec_single_context
    self._handle_dbapi_exception(
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) too many SQL variables
[SQL: SELECT dynamic_partitions.partition 
FROM dynamic_partitions 
WHERE dynamic_partitions.partition IN (?, ?, ?, ?, <OMITTED FOR BREVITY> ?, ?, ?) AND dynamic_partitions.partitions_def_name = ?]
[parameters: ('dummy_dir/0/0.txt', 'dummy_dir/0/1.txt', 'dummy_dir/0/2.txt', <OMITTED FOR BREVITY> 'dummy_dir/9/99.txt','s3_files')]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

What did you expect to happen?

I would expect the dynamic partitioning to be able to handle 1000+ partitions. This appears to be linked to a RHEL-specific limitation of the SQLITE_MAX_VARIABLE_NUMBER variable that limits variables to <1000 (see this forum post). The error is also not encountered on Ubuntu, which is to be expected given that the SQLITE_MAX_VARIABLE_NUMBER variable is 250000.

If the user is responsible for setting this variable, it should be documented.

How to reproduce?

Using a RHEL 7 instance:

  1. Create a directory with 1000+ dummy files. My use-case was S3, but a local directory should reproduce the same issue
  2. Declare a sensor that creates a dynamic partition for each file in the directory and generates an asset
  3. Start Dagster and enable the sensor

The error should occur when the sensor detects the files.

Deployment type

Local

Deployment details

Deployment details:

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

sryza commented 1 year ago

@clairelin135 - any idea what's going on here?