apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
61.92k stars 13.57k forks source link

Sql lab + shillelagh does not return column values ​​greater than 64. #30245

Open mibarannikov opened 1 week ago

mibarannikov commented 1 week ago

Bug description

All values ​​are null in json for fields greater than 64. I updated superset 3.1 -> 4.0.2. I updated shillagh to 1.2.27 (apsw 3.42.0.1). I created another json array source. But nothing changed. All json fields greater than 64 are null.

Generation was performed by the activity process. var test = []; for (var l = 0; l < 2; l++) { var o = {};

for (var i = 0; i < 100; i++) {
    o["col_" + i] = i + (100 * l);
}
test.push(o);

} execution.setVariable("test", JSON.stringify(test));

How to reproduce the bug

362841266-a732a7dc-4ca4-4803-a227-f42225142ad0 (1)

Screenshots/recordings

362841266-a732a7dc-4ca4-4803-a227-f42225142ad0 362574713-ba6db765-efea-45c0-a0aa-da2f27b391a6

Superset version

4.0.2

Python version

3.9

Node version

16

Browser

Chrome

Additional context

logs from superset 2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,485:DEBUG:shillelagh.adapters.api.generic_json:{'col_0': 0, 'col_1': 1, 'col_2': 2, 'col_3': 3, 'col_4': 4, 'col_5': 5, 'col_6': 6, 'col_7': 7, 'col_8': 8, 'col_9': 9, 'col_10': 10, 'col_11': 11, 'col_12': 12, 'col_13': 13, 'col_14': 14, 'col_15': 15, 'col_16': 16, 'col_17': 17, 'col_18': 18, 'col_19': 19, 'col_20': 20, 'col_21': 21, 'col_22': 22, 'col_23': 23, 'col_24': 24, 'col_25': 25, 'col_26': 26, 'col_27': 27, 'col_28': 28, 'col_29': 29, 'col_30': 30, 'col_31': 31, 'col_32': 32, 'col_33': 33, 'col_34': 34, 'col_35': 35, 'col_36': 36, 'col_37': 37, 'col_38': 38, 'col_39': 39, 'col_40': 40, 'col_41': 41, 'col_42': 42, 'col_43': 43, 'col_44': 44, 'col_45': 45, 'col_46': 46, 'col_47': 47, 'col_48': 48, 'col_49': 49, 'col_50': 50, 'col_51': 51, 'col_52': 52, 'col_53': 53, 'col_54': 54, 'col_55': 55, 'col_56': 56, 'col_57': 57, 'col_58': 58, 'col_59': 59, 'col_60': 60, 'col_61': 61, 'col_62': 62, 'col_63': 63, 'col_64': 64, 'col_65': 65, 'col_66': 66, 'col_67': 67, 'col_68': 68, 'col_69': 69, 'col_70': 70, 'col_71': 71, 'col_72': 72, 'col_73': 73, 'col_74': 74, 'col_75': 75, 'col_76': 76, 'col_77': 77, 'col_78': 78, 'col_79': 79, 'col_80': 80, 'col_81': 81, 'col_82': 82, 'col_83': 83, 'col_84': 84, 'col_85': 85, 'col_86': 86, 'col_87': 87, 'col_88': 88, 'col_89': 89, 'col_90': 90, 'col_91': 91, 'col_92': 92, 'col_93': 93, 'col_94': 94, 'col_95': 95, 'col_96': 96, 'col_97': 97, 'col_98': 98, 'col_99': 99, 'rowid': 0} 2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,485:DEBUG:shillelagh.adapters.api.generic_json:{'col_0': 100, 'col_1': 101, 'col_2': 102, 'col_3': 103, 'col_4': 104, 'col_5': 105, 'col_6': 106, 'col_7': 107, 'col_8': 108, 'col_9': 109, 'col_10': 110, 'col_11': 111, 'col_12': 112, 'col_13': 113, 'col_14': 114, 'col_15': 115, 'col_16': 116, 'col_17': 117, 'col_18': 118, 'col_19': 119, 'col_20': 120, 'col_21': 121, 'col_22': 122, 'col_23': 123, 'col_24': 124, 'col_25': 125, 'col_26': 126, 'col_27': 127, 'col_28': 128, 'col_29': 129, 'col_30': 130, 'col_31': 131, 'col_32': 132, 'col_33': 133, 'col_34': 134, 'col_35': 135, 'col_36': 136, 'col_37': 137, 'col_38': 138, 'col_39': 139, 'col_40': 140, 'col_41': 141, 'col_42': 142, 'col_43': 143, 'col_44': 144, 'col_45': 145, 'col_46': 146, 'col_47': 147, 'col_48': 148, 'col_49': 149, 'col_50': 150, 'col_51': 151, 'col_52': 152, 'col_53': 153, 'col_54': 154, 'col_55': 155, 'col_56': 156, 'col_57': 157, 'col_58': 158, 'col_59': 159, 'col_60': 160, 'col_61': 161, 'col_62': 162, 'col_63': 163, 'col_64': 164, 'col_65': 165, 'col_66': 166, 'col_67': 167, 'col_68': 168, 'col_69': 169, 'col_70': 170, 'col_71': 171, 'col_72': 172, 'col_73': 173, 'col_74': 174, 'col_75': 175, 'col_76': 176, 'col_77': 177, 'col_78': 178, 'col_79': 179, 'col_80': 180, 'col_81': 181, 'col_82': 182, 'col_83': 183, 'col_84': 184, 'col_85': 185, 'col_86': 186, 'col_87': 187, 'col_88': 188, 'col_89': 189, 'col_90': 190, 'col_91': 191, 'col_92': 192, 'col_93': 193, 'col_94': 194, 'col_95': 195, 'col_96': 196, 'col_97': 197, 'col_98': 198, 'col_99': 199, 'rowid': 1} 2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,490:DEBUG:requests_cache.policy.actions:Cache directives from request headers: CacheDirectives() 2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,490:DEBUG:requests_cache.policy.actions:Pre-read cache checks: Passed 2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,490:DEBUG:requests_cache.policy.actions:Post-read cache actions: CacheActions(expire_after=180.0) 2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,490:DEBUG:shillelagh.adapters.api.generic_json:{'col_0': 0, 'col_1': 1, 'col_2': 2, 'col_3': 3, 'col_4': 4, 'col_5': 5, 'col_6': 6, 'col_7': 7, 'col_8': 8, 'col_9': 9, 'col_10': 10, 'col_11': 11, 'col_12': 12, 'col_13': 13, 'col_14': 14, 'col_15': 15, 'col_16': 16, 'col_17': 17, 'col_18': 18, 'col_19': 19, 'col_20': 20, 'col_21': 21, 'col_22': 22, 'col_23': 23, 'col_24': 24, 'col_25': 25, 'col_26': 26, 'col_27': 27, 'col_28': 28, 'col_29': 29, 'col_30': 30, 'col_31': 31, 'col_32': 32, 'col_33': 33, 'col_34': 34, 'col_35': 35, 'col_36': 36, 'col_37': 37, 'col_38': 38, 'col_39': 39, 'col_40': 40, 'col_41': 41, 'col_42': 42, 'col_43': 43, 'col_44': 44, 'col_45': 45, 'col_46': 46, 'col_47': 47, 'col_48': 48, 'col_49': 49, 'col_50': 50, 'col_51': 51, 'col_52': 52, 'col_53': 53, 'col_54': 54, 'col_55': 55, 'col_56': 56, 'col_57': 57, 'col_58': 58, 'col_59': 59, 'col_60': 60, 'col_61': 61, 'col_62': 62, 'col_63': 63, 'rowid': 0} 2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,491:DEBUG:root:Query 260: Handling cursor 2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,495:DEBUG:shillelagh.adapters.api.generic_json:{'col_0': 100, 'col_1': 101, 'col_2': 102, 'col_3': 103, 'col_4': 104, 'col_5': 105, 'col_6': 106, 'col_7': 107, 'col_8': 108, 'col_9': 109, 'col_10': 110, 'col_11': 111, 'col_12': 112, 'col_13': 113, 'col_14': 114, 'col_15': 115, 'col_16': 116, 'col_17': 117, 'col_18': 118, 'col_19': 119, 'col_20': 120, 'col_21': 121, 'col_22': 122, 'col_23': 123, 'col_24': 124, 'col_25': 125, 'col_26': 126, 'col_27': 127, 'col_28': 128, 'col_29': 129, 'col_30': 130, 'col_31': 131, 'col_32': 132, 'col_33': 133, 'col_34': 134, 'col_35': 135, 'col_36': 136, 'col_37': 137, 'col_38': 138, 'col_39': 139, 'col_40': 140, 'col_41': 141, 'col_42': 142, 'col_43': 143, 'col_44': 144, 'col_45': 145, 'col_46': 146, 'col_47': 147, 'col_48': 148, 'col_49': 149, 'col_50': 150, 'col_51': 151, 'col_52': 152, 'col_53': 153, 'col_54': 154, 'col_55': 155, 'col_56': 156, 'col_57': 157, 'col_58': 158, 'col_59': 159, 'col_60': 160, 'col_61': 161, 'col_62': 162, 'col_63': 163, 'rowid': 1} 2024-08-29T20:51:33+03:00 Query 260: Storing results in results backend, key: 8c5eba8a-427b-4cf3-b9a4-b735de18f0aa

Checklist

dosubot[bot] commented 1 week ago

@mibarannikov Hello! I'm here to help you with any bugs, questions, or contributions.

The issue you're encountering with the Shillelagh adapter not returning column values greater than 64 in JSON data sources might be related to specific configurations or limitations in the adapter's handling of JSON data.

Potential Causes and Solutions:

  1. Column Limit Configuration: The Shillelagh adapter has a limit configuration that might be affecting the number of columns processed. The get_data method in the SupersetShillelaghAdapter class includes a limit configuration that fetches the app_limit from the application configuration SUPERSET_META_DB_LIMIT:

    def get_data(
       self,
       bounds: dict[str, Filter],
       order: list[tuple[str, RequestedOrder]],
       limit: int | None = None,
       offset: int | None = None,
       **kwargs: Any,
    ) -> Iterator[Row]:
       app_limit: int | None = current_app.config["SUPERSET_META_DB_LIMIT"]
       if limit is None:
           limit = app_limit
       elif app_limit is not None:
           limit = min(limit, app_limit)
    
       query = self._build_sql(bounds, order, limit, offset)
    
       with self.engine_context() as engine:
           connection = engine.connect()
           rows = connection.execute(query)
           for i, row in enumerate(rows):
               data = dict(zip(self.columns, row))
               data["rowid"] = data[self._rowid] if self._rowid else i
               yield data

    Ensure that the SUPERSET_META_DB_LIMIT is set to a value that accommodates the number of columns you need.

  2. Column Inspection: The _set_columns method in the SupersetShillelaghAdapter class inspects the table to get its columns during initialization. This method might be missing columns if there is a limit or issue during the inspection process:

    def _set_columns(self) -> None:
       from superset.models.core import Database
    
       database = (
           db.session.query(Database).filter_by(database_name=self.database).first()
       )
       if database is None:
           raise ProgrammingError(f"Database not found: {self.database}")
       self._allow_dml = database.allow_dml
    
       table = sql_parse.Table(self.table, self.schema, self.catalog)
       security_manager.raise_for_access(database=database, table=table)
    
       self.engine_context = partial(
           database.get_sqla_engine,
           catalog=self.catalog,
           schema=self.schema,
       )
    
       metadata = MetaData()
       with self.engine_context() as engine:
           try:
               self._table = Table(
                   self.table,
                   metadata,
                   schema=self.schema,
                   autoload=True,
                   autoload_with=engine,
               )
           except NoSuchTableError as ex:
               raise ProgrammingError(f"Table does not exist: {self.table}") from ex
    
       primary_keys = [
           column for column in list(self._table.primary_key) if column.primary_key
       ]
       if len(primary_keys) == 1 and primary_keys[0].type.python_type == int:
           self._rowid = primary_keys[0].name
    
       self.columns = {
           column.name: self.get_field(column.type.python_type)
           for column in self._table.c
       }
  3. Logging and Debugging: Enable detailed logging to identify any issues during the data retrieval process. Adjust the logging level to DEBUG in the superset/config.py file:

    LOG_LEVEL = logging.DEBUG
    TIME_ROTATE_LOG_LEVEL = logging.DEBUG
  4. Check for Known Issues: There might be known issues or bug reports related to handling large JSON objects or arrays in the Shillelagh adapter. Ensure you have the latest version of the adapter and check the repository for any related issues.

By addressing these potential causes, you should be able to resolve the issue with column values greater than 64 being null in JSON data sources. If the problem persists, further investigation into the specific configurations and limitations of the Shillelagh adapter may be necessary.

To continue talking to Dosu, mention @dosu.

rusackas commented 1 week ago

@betodealmeida is definitely the one to talk to here!

mibarannikov commented 6 days ago

Thanks, updating the library apsw helped.