Open Deanozk opened 5 months ago
Thanks for opening this issue! Based on our automated check, it seems that your post contains some code but it does not use code blocks to format it.
Please double-check your post and revise it if necessary. To employ syntax highlighting, it's recommended to use code blocks with triple backticks, e.g.:
```sql
SELECT ...
If this is a false positive, feel free to disregard this comment.
I will load up my original spreetsheet sludge.xlsx that caused the original error and check it and report back whats in it or not in it.
I can't share the whole sheet but 1) it loaded ok into google sheets so its a valid spreedsheet. 2) its first three columns do not have values for first three rows ,but columns do have values after that. Here is a picture of the columns
ABC have do not have values for the first three rows but they do after that. D-AE have values in all rows. There is only one sheet in the document.
Only other thing besides not having values for the first three columns in the first three rows is that there is some vertically aligned text in some columns as opposed to horizontally aligned.
I added a1, b1, b3 a2, b2, b3 a3, b3, c3 and the sheet loaded after that. It does appear to report up to 1000k plus rows? Not sure why it loads that many. here is the sheet as downloaded. Please see that sd.xlsx also loads 1k plus rows.
In summary it seems that first colunn rows must be filled in or it will not even import as distinct from create a table. The error message related to there not being at least one column is not descriptive as the table has many columns and all have values at some point. It was not decided that the first column for example is the primary etc. So I think this error message is inaccurate. Also there needs to be more reported for a full crash like this.
What happens?
I'm a new user. I installed duckdb version 0.10.0 and followed instructions to import excel spreadsheet. My files contain multiple sheets inside. Simple files load ok however for my private file that has no values in first three rows of first three columns (but does have values after that) I get fatal error and need to restart in some cases. I am also noticing its not clear how to specify which sheet or sheets of a multisheet document to load or how it determines the number of rows. See attached sheets.
To Reproduce
download a complex excel spreadsheet with multiple sheets. I downloaded
SaleData.xlsx # Note I did this after first trying and failing to load my own private xlsx sheet. Steps I took 1. install spatial and load spatial. Its possible on my first attempt I did not load spatial but I am not sure. I received an error trying to load: SELECT FROM st_read('sludge.xlsx'); This produced an error: SELECT FROM st_read('sludge.xlsx'); Error: INTERNAL Error: Failed to bind "ST_Read": Table function must return at least one column
This worked. So I tried the singled the single sheet file from SalesData. That also worked ok. but it loaded over a thousand rows. Perhaps when i downloaded it from google it created extra rows? not sure. Here is the actual file namelist.xlsx Maybe it has over 1k rows but I dont think so. So I then tried loading SaleData file and it loaded ok in a way but it loaded 45 rows. In fact there are 8 rows in the first sheet and 44 rows in the second sheet. Why does it appear to have selected the second sheet? After this sort of worked (except for the number of rows) I tried the original sheet that it had trouble loading and got the following errors: Internal Error failed to bind st_reader. Table must return at least one column. (Which table?) can it please say which table it wants to try to get this from? Because spreadsheets can have many right? When I try to load another spreadsheet the SaleData again it gives a fatal error saying it was invalidated by previous error.
I will take a look at my private xlsx that seemed to cause the original issue but in any event I think the behavior is odd and not expected. Plus the documentation is unclear on how to load a document with multiple sheets inside. Its simply not clear enough. most spreadsheets DO have multiple sheets inside. So the documentation is off. The logic of which sheets and how many rows get loaded is at best inadequately documented and more likely simply does not work for certain types of spreedsheets. As for error messages I think it should what table its trying to load(name please?) and also if needs values for the column data just say that. Keep in mind thats not a requirement for a valid spreedsheet for data frame as afaik.
Update: the file causing the original error does not have values for the first three columns in the first three rows. It does have values after that. It did load ok into sheets.google.com
Update: Adding values for a1,b1,c1 a2,b2,c2 a2,b3,c3 which were not present in the original sludge.xlsx does allow the document to load. It seems that having values in the initial rows of the columns are a requirement? Just to load it? Even before creating a table? It loads fine in Spreadsheet. Also with runtime crash more information should be given. And the error message provided did not indicate missing column or row values but something
OS:
M1 Mac OS Silicon Ventura 13.3.1
DuckDB Version:
v0.10.0
DuckDB Client:
Python
Full Name:
Dean Oz
Affiliation:
None
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have not tested with any build
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue? Yes I have. However, I do need to check the original file as well that caused the original error.