Open ex0ns opened 2 weeks ago
@ex0ns let me know if you need any assistance doing the contribution.
I did not start looking at this yet, I wanted to understand if there were technical challenges and/or why it was made that way, there is even a not in the connector:
Connector Name
source-file
Connector Version
0.5.13
What step the error happened?
None
Relevant information
I was trying to load an Excel (XLSX) file containing multiple sheets and I noticed that in the output all my headers were actually mixed up and no information about the sheet themselves were kept.
I was expecting an outcome similar to the one we can have when loading data from a Google Sheet, where it would create a source and within this source we would have table (i.e streams) for each of the sheet of the document.
This seems related to this part of the code: https://github.com/airbytehq/airbyte/blob/b1b2f9c744408665d29f115826eab8d36e3b503e/airbyte-integrations/connectors/source-file/source_file/client.py#L507-L528
Is there a reason it was done that way ? Would it be possible to keep information about each of the existing sheet of the document ? I don't have any experience with Airbyte source code so I wanted to make sure I was looking at the right place, and maybe get a few pointers on where to start in order to contribute and maybe improve the Excel reader, but I first wanted to understand why it was done this way in the first place.
Thanks !
Relevant log output
No response
Contribute