apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.79k stars 13.15k forks source link

Tables with long numbers are truncated when exported to excel #28551

Open jose-vines-toteat opened 2 months ago

jose-vines-toteat commented 2 months ago

Bug description

When downloading a table with long numbers with the excel options, these are truncated by replacing the last number with a 0. For example, 1100108628127863 would appear as 1100108628127860. This does not happen when exporting to CSV.

How to reproduce the bug

Create a table where a dimension is some large number (15-16 digits) and then download as excel.

Screenshots/recordings

This is an extract of the table in my dashboard

image

This is what it looks like in excel

image

Superset version

3.1.3

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

dosubot[bot] commented 2 months ago

The issue with long numbers being truncated when exported to Excel from Superset is due to Excel's limitation of handling a maximum precision of 15 digits for numbers. This results in numbers longer than 15 digits being truncated or rounded. To prevent this, you can modify the df_to_excel function in Superset to convert long numbers to strings before exporting. This approach ensures that Excel treats these numbers as text, avoiding truncation or scientific notation conversion.

Here's a modification you can apply:

import io
from typing import Any
import pandas as pd

def df_to_excel(df: pd.DataFrame, **kwargs: Any) -> Any:
    output = io.BytesIO()
    # Convert long numbers to strings to avoid truncation
    for column in df.columns:
        if df[column].dtype == 'int64':  # Check if the column is of int64 type
            df[column] = df[column].astype(str)
    # Handle timezones
    for column in df.select_dtypes(include=["datetimetz"]).columns:
        df[column] = df[column].astype(str)
    with pd.ExcelWriter(output, engine="xlsxwriter") as writer:
        df.to_excel(writer, **kwargs)
    return output.getvalue()

This code checks each column's data type and converts int64 columns to strings, ensuring that long numbers are accurately represented in the exported Excel file. Keep in mind, converting numbers to strings might not be ideal if you plan to perform numeric operations on these values in Excel, as they will be treated as text.

#### Details **Sources** - [superset-frontend/src/explore/components/controls/ColumnConfigControl/constants.tsx](https://github.com/apache/superset/blob/master/superset-frontend/src/explore/components/controls/ColumnConfigControl/constants.tsx) - [superset/utils/excel.py](https://github.com/apache/superset/blob/master/superset/utils/excel.py) - [superset/commands/sql_lab/export.py](https://github.com/apache/superset/blob/master/superset/commands/sql_lab/export.py) - [superset-frontend/plugins/plugin-chart-echarts/src/BigNumber/BigNumberTotal/controlPanel.ts](https://github.com/apache/superset/blob/master/superset-frontend/plugins/plugin-chart-echarts/src/BigNumber/BigNumberTotal/controlPanel.ts) - [superset-frontend/src/setup/setupFormatters.ts](https://github.com/apache/superset/blob/master/superset-frontend/src/setup/setupFormatters.ts)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

rusackas commented 2 months ago

@jose-vines-toteat let us know if you want to open a PR to solve the problem. We'd love to get feedback on Dosu's suggestions here, since if it leads to a fix, that's pretty amazing.