Open GregoryTravis opened 1 month ago
my_workbook = Data.read blah.xlsx -> read => single sheet name or address returns a table. -> read_many => Vector of names and returns either a merged table or a table of tables. read_many : Vector Text -> Headers -> Return_As -> Problem_Behavior -> Table read_many self sheet_names:Vector=self.sheet_names (headers:Headers=..Detect_Headers) (return:Return_As=..Merged_Table) (on_problems:Problem_Behavior=..Report_Warning) =
Sheet1 A1:E5 Sheet2 A1:B3
my_workbook.expand_to_rows
SheetName Value
Sheet1 Table
my_workbook.expand_to_rows
SheetName Value
Sheet1 Row
expand_to_columns SheetName A B C D E ...
FileName SheetName A B C D E June.xlsx Week1 1 3 5 3 1 June.xlsx Week2 2 4 6 4 2
my_files = ["blah.xlsx", "blah2.xlsx", ...]
my_files.map .read => Vector Data.read_many my_files ..Table_of_Tables => Table
Data.read read : Text | URI | File -> File_Format -> Problem_Behavior -> Any ! File_Error read path=(Missing_Argument.throw "path") format=Auto_Detect (on_problems : Problem_Behavior = ..Report_Warning) = case path of
Data.read_many paths:Vector Text format=Auto_Detect (return:Return_As=..Merged_Table) (on_problems:Problem_Behavior=..Report_Warning) Step 0: Decode the paths argument. Do a type class of FileManyList
Could be a Table (in which case if a single text column as above, otherwise must have a path
column not case sensitive).
All columns in the output.
Step 1: Read the Data in.
Data.read
.If a file fails to read then depending on_problems:
Vector.map
.Step 2: Merge the data. Could have a Return parser SPI which takes the FileManyList and object and merges to a result.
Challenges:
Tasks:
Radosław Waśko reports a new STANDUP for today (2024-10-31):
Progress: Implemented basic read many from vector, column or table to a vector, not tested yet but general API shape and implementation is drafted. It should be finished by 2024-11-07.
Next Day: Next day I will be working on the same task. Add tests and make sure they pass. Continue work - add new return types - simple table and merged table. Merge with excel read many. Tests.
Radosław Waśko reports a new STANDUP for yesterday (2024-11-04):
Progress: Added tests and fixed edge cases. Created 1st PR. It should be finished by 2024-11-07.
Next Day: Next day I will be working on the same task. Continue work - add new return types - simple table and merged table. Merge with excel read many. More tests.
Radosław Waśko reports a new STANDUP for yesterday (2024-11-05):
Progress: Adding return as table, defaults dependent on input type, basic logic for merging tables. It should be finished by 2024-11-07.
Next Day: Next day I will be working on the same task. Add tests, merge with excel read many.
Radosław Waśko reports a new STANDUP for yesterday (2024-11-06):
Progress: Adding more edge case tests, discussing edge case behaviour. Fixes to make the logic actually work. It should be finished by 2024-11-07.
Next Day: Next day I will be working on the same task. Continue fixes, think how to merge with Excel. Align expand_
methods.
Radosław Waśko reports a new STANDUP for yesterday (2024-11-07):
Progress: Fixing tests, edge cases for JS_Object. It should be finished by 2024-11-08.
Next Day: Next day I will be working on the same task. Test for XLS. Align expand_ methods.
Radosław Waśko reports a new STANDUP for today (2024-11-08):
Progress: Reviewing PRs (looking at Number Parser and trying to understand it, so took some time). Trying out Excel read many in practice. Fixing widgets for Data.read_many. Starting work on merging the logic with Excel read many. It should be finished by 2024-11-13.
Next Day: Next day I will be working on the same task. Merge with excel read many. For now without XLS test, just keep logic unified. Add a test for Data.read_many
for Match_Columns
and Columns_To_Keep
. Prepare PR. Create tickets for expanding sheets in Data.read_many
, add ticket for aligning types in expand_*
methods.
Radosław Waśko reports a new STANDUP for yesterday (2024-11-12):
Progress: Added test cases for reading many excel files with many sheets, cases for merging types and some other edge cases (0 element arrays). Working on union logic. It should be finished by 2024-11-13.
Next Day: Next day I will be working on the same task. Finish the new logic to get the tests to pass
Radosław Waśko reports a new STANDUP for yesterday (2024-11-13):
Progress: Adding more edge cases. Found a problem with Union logic in case of all-null columns, discussed to prioritize Value_Type.Null ticket. Fixed the empty table edge case. It should be finished by 2024-11-14.
Next Day: Next day I will be working on the same task. Fix empty array edge case and prepare a PR.
Radosław Waśko reports a new STANDUP for yesterday (2024-11-14):
Progress: Discussed empty table/array edge case, updated tests. Added edge case tests for weird shaped files. Put up the PR. It should be finished by 2024-11-15.
Next Day: Next day I will be working on the #6281 task. Fix last failing test, start new task.
Reads a set of files into a table. The result is either one row per table, or the combined contents of all the tables.
Implement
File_Format.read_many
: likeFile_Format.read
but reads a set of files.