I was attempting to create a sensor to detect unprocessed files in a S3 repository and generate the assets for them using dynamic partitions. An internal SQL error occurs when attempting to create partitions for the default boto3 MaxKeys parameter of 1000.
Traceback (most recent call last):
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/dagster/_daemon/sensor.py", line 534, in _process_tick_generator
yield from _evaluate_sensor(
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/dagster/_daemon/sensor.py", line 698, in _evaluate_sensor
instance.add_dynamic_partitions(
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/dagster/_utils/__init__.py", line 651, in inner
return func(*args, **kwargs)
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/dagster/_core/instance/__init__.py", line 2031, in add_dynamic_partitions
return self._event_storage.add_dynamic_partitions(partitions_def_name, partition_keys)
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/dagster/_core/storage/event_log/sql_event_log.py", line 1954, in add_dynamic_partitions
existing_rows = conn.execute(
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1412, in execute
return meth(
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
return connection._execute_clauseelement(
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1635, in _execute_clauseelement
ret = self._execute_context(
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
return self._exec_single_context(
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1984, in _exec_single_context
self._handle_dbapi_exception(
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
self.dialect.do_execute(
File "~/.cache/pypoetry/virtualenvs/venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) too many SQL variables
[SQL: SELECT dynamic_partitions.partition
FROM dynamic_partitions
WHERE dynamic_partitions.partition IN (?, ?, ?, ?, <OMITTED FOR BREVITY> ?, ?, ?) AND dynamic_partitions.partitions_def_name = ?]
[parameters: ('dummy_dir/0/0.txt', 'dummy_dir/0/1.txt', 'dummy_dir/0/2.txt', <OMITTED FOR BREVITY> 'dummy_dir/9/99.txt','s3_files')]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
What did you expect to happen?
I would expect the dynamic partitioning to be able to handle 1000+ partitions. This appears to be linked to a RHEL-specific limitation of the SQLITE_MAX_VARIABLE_NUMBER variable that limits variables to <1000 (see this forum post). The error is also not encountered on Ubuntu, which is to be expected given that the SQLITE_MAX_VARIABLE_NUMBER variable is 250000.
If the user is responsible for setting this variable, it should be documented.
How to reproduce?
Using a RHEL 7 instance:
Create a directory with 1000+ dummy files. My use-case was S3, but a local directory should reproduce the same issue
Declare a sensor that creates a dynamic partition for each file in the directory and generates an asset
Start Dagster and enable the sensor
The error should occur when the sensor detects the files.
Deployment type
Local
Deployment details
Deployment details:
RHEL 7 / Amazon Linux 2
Python 3.10
Dagster 1.4.11
Dagster-aws 0.20.11
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
Dagster version
1.4.11
What's the issue?
I was attempting to create a sensor to detect unprocessed files in a S3 repository and generate the assets for them using dynamic partitions. An internal SQL error occurs when attempting to create partitions for the default boto3
MaxKeys
parameter of 1000.What did you expect to happen?
I would expect the dynamic partitioning to be able to handle 1000+ partitions. This appears to be linked to a RHEL-specific limitation of the
SQLITE_MAX_VARIABLE_NUMBER
variable that limits variables to <1000 (see this forum post). The error is also not encountered on Ubuntu, which is to be expected given that theSQLITE_MAX_VARIABLE_NUMBER
variable is 250000.If the user is responsible for setting this variable, it should be documented.
How to reproduce?
Using a RHEL 7 instance:
The error should occur when the sensor detects the files.
Deployment type
Local
Deployment details
Deployment details:
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.