Closed v0dro closed 6 years ago
@athityakumar you should incorporate this into your project if nobody submits a PR before the coding period.
It will be good to look at write_excel as well for xlsx .
@jeydurai Ideas on how far it should go? Like styling of output and something something?
1) While reading as daru dataframe and writing back as excel, it should preserve the original formatting on both ways. 2) Should be able to read the default sheet (selected one) if we call for read_excel function in daru object, in addition to which, there should be a functionality to create an Excel object, by which, any specific sheet can be read by giving the sheet name with an option to specify which column in the source file to be considered as Index of daru dataframe. 3) We should be able to save multiple daru dataframes as multiple sheet of a single excel file. 2) It should have an engine to give pivot_table functionalities. (The most important) 3) Feature should be there to write the pivot table object as excel file
@jeydurai Thanks, that is pretty informative list, we'll work our way on it.
@athityakumar I believe you should be really interested in this discussion :) @Shekharrajak for you it is also may be useful (as "export to Excel with pivot tables" COULD be part of "views" rather than exporters, but it is a point to discuss again).
@v0dro - Sure, I'll try to accommodate these features in my project timeline. @zverok - Yes, I'm looking forward to this. :) @Shekharrajak - Regarding (4) and (5), we can consider using the pivot-table gem.
@jeydurai - Thanks for this list. Feel free to add more, if you come across any more potential use cases - my coding period starts from May 30th. I'll try incorporating as much features as possible within the timeline.
Regarding (3), would something like this suffice as use case?
df1.to_excel "filename.xlsx", sheet: 1
df2.to_excel "filename.xlsx", sheet: 2
@athityakumar I will surely come back in a day or two in order to add more use cases. @athityakumar For the use cases of (3), the following syntax may be helpful. Reading: xlsx = Daru::DataFrame.ExcelFile(PATH\FileName.xlsx) df_sheet1 = xlsx.parse('Sheet1') df_sheet2 = xlsx.parse('Sheet2') OR df_sheet1 = xlsx.parse(0) df_sheet2 = xlsx.parse(1)
Writing: (suggested use case looks like saving the file as many times as the number of df/sheets we have. However, the following syntax may be good). writer = Daru::DataFrame.ExcelWriter('path/to/file_name.xlsx', engine='xlsxwriter') # to differentiate if we need XL or XLSX. df1.to_excel(writer, sheet_name='Sheet1') df2.to_excel(writer, sheet_name='Sheet2') df3.to_excel(writer, sheet_name='Sheet3') writer.save (This could act as lazy so that only when we call save method from the writer object, the functionality should start writing to disc.)
We can create pivot table using the gem pivot_table as suggested by @athityakumar and using Daru::Formatters::Table.format(data, headers: headers, row_headers: row_headers)
, we can get the table.
I tried one example: gist link
pivot_table gem looks too inflexible and heavy to me. Moreover there will be the overhead of copying data from daru to the other gem. Maybe you can port some of the code from there to daru?
I also don't think we need any external gem for building pivot tables. We have everything you can need for this in Daru already.
I believe that writing several dataframes to several sheets IS a usecase, yet @athityakumar's proposal (with atomic writes, each "write this DF to that file, N sheet" is separate op) is good enough for starters. As far as I know, XLSX files opening/closing doesn't have a lot of overhead, so from "just dataframe"s point of view atomic writes is more logical.
Though, as a far future idea, support for some sort of "transactions" for writing could be useful, like
But it is "next level", anyways.
Thanks for suggestions. I think we must extend the DataFrame#pivot_table features accordingly.
:agg
- Function to aggregate the grouped values. Default to :mean. This must be changed if there is only single value to return then that will be added in column value (String
, Symbol
). Means it must not be restricted to numeric vectors.Submitted PR https://github.com/athityakumar/daru-io/pull/28 for XLSX Importer. Please review. 😄
Implemented in daru-io
The
DataFrame.from_excel
method does not support loading XLSX files since thespreadsheet
gem does not support it.We should write new functionality so that XLSX can be supported using the
rubyXL
gem.