ODM2 / YODA-File

The YAML Observation Data Archive & exchange (YODA) File Format
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

Excel Performance Limits #3

Closed SRGDamia1 closed 9 years ago

SRGDamia1 commented 9 years ago

I'm hitting some performance limits with excel with the time series template. Thus a few questions:

How many columns of data are we really likely to have for a time series? 50? 100? 200? How many rows of data? I'd like to have nearly 100,000 possible (~1 year of 5-minute data), but is that more than we need?

horsburgh commented 9 years ago

I'm not sure about the column question, but in terms of rows - I don't think it would be uncommon for time series to hit 100,000. What are the specific performance limits you are experiencing?

SRGDamia1 commented 9 years ago

Excel crashes because too many formulas are trying to calculate. Excel has no (non-VBA) function to concatenate all of the data columns with brackets and comma separators. This means that right now I'm using a giant worksheet of formulas to get the data table into the right format and there have to be enough cells filled with formulas as the maximum size of the data table. Even 5,000 rows x 200 columns of that is completely crashing excel for me. I'm trying to read up on ways to optimize it, but we might end up being really limited as to the sizes of tables we can use in excel without using macros.

horsburgh commented 9 years ago

Hmmmm... Might be time to consider a simple macro or two. Some people might not like Macros, but I think the size limitations will be constraining.

klehnert55 commented 9 years ago

As much as I am always advocating Excel because of its broad use in the community, I think we need to acknowledge its limitations when the data gets to be very complex. I think that the data entry to ODM2 requires (web?) applications to help generate the YODA files.

Kerstin

On 3/4/15 13:46 PM, Jeff Horsburgh wrote:

Hmmmm... Might be time to consider a simple macro or two. Some people might not like Macros, but I think the size limitations will be constraining.

— Reply to this email directly or view it on GitHub https://github.com/CZOData/YODA-File/issues/3#issuecomment-77219073.

Dr. Kerstin Lehnert Director, Integrated Earth Data Applications Director, EarthChem President, IGSN e.V.

Lamont-Doherty Earth Observatory Columbia University Palisades, NY, 10964 (845) 365-8506 http://www.iedadata.org http://www.earthchem.org http://www.igsn.org

emiliom commented 9 years ago

I was thinking along the same lines as Kerstin (but excuse my ignorance, as I haven't been a core part of the YODA efforts). Is there a strong requirement that an Excel file by itself generate the YODA payload? Or is it possible to decouple those roles, and have the Excel file be the data entry/capturing tool (with cool internal validation and drop-downs, as is already the plan), but have a separate process (eg, a Python script that reads the data from the Excel file) actually generate the YODA files, running on a CZOData server, after the Excel file is submitted?

Just my 0.05 cent.

SRGDamia1 commented 9 years ago

I was under the impression we really wanted Excel to do everything. Unfortunately... I'm not sure it's going to happen. I have the excel file working, but just having enough formulas present for 30,000 rows by 100 columns of data, the worksheet is 52 MB and takes 2.5 minutes to load up on my computer. Of course, that time depends on the computer, but I suspect it's big enough to crash excel on an older computer and it's long enough to be really annoying even with a newer computer.

klehnert55 commented 9 years ago

Just making such spreadsheet available for download will be a challenge, and users will have ongoing problems with a spreadsheet of this size. I think it is time to seriously consider alternative ways to generate YODA files.

On 3/4/15 16:32 PM, Sara Damiano wrote:

I was under the impression we really wanted Excel to do everything. Unfortunately... I'm not sure it's going to happen. I have the excel file working, but just having enough formulas present for 30,000 rows by 100 columns of data, the worksheet is 52 MB and takes 2.5 /minutes/ to load up on my computer. Of course, that time depends on the computer, but I suspect it's big enough to crash excel on an older computer and it's long enough to be really annoying even with a newer computer.

— Reply to this email directly or view it on GitHub https://github.com/CZOData/YODA-File/issues/3#issuecomment-77253709.

Dr. Kerstin Lehnert Director, Integrated Earth Data Applications Director, EarthChem President, IGSN e.V.

Lamont-Doherty Earth Observatory Columbia University Palisades, NY, 10964 (845) 365-8506 http://www.iedadata.org http://www.earthchem.org http://www.igsn.org

SRGDamia1 commented 9 years ago

A huge amount of the current bulk in Excel is in formatting the data values table. (I.e., concatenating the data with the commas and brackets.) For Excel to generate the YAML of the header, the file is only ~1MB and opens with no annoying wait. So we could use excel to generate the header and then tell people to format their data table with the brackets and then use their own text editor to glue the two pieces together. I don't know if users will like that solution, though and there are all kinds of possible problems with breaking the process into two pieces.

I'll start investigating possibly using macros to reduce the size of the excel file, but I've never really worked with VBA before so I can't make any promises.

horsburgh commented 9 years ago

To answer the question from @emiliom about having Excel do everything: I vote for not writing any tools that try to parse Excel files directly. If we did so, we don't need the YAML file at all as the Excel file would be the exchange file. However, my group's experience with trying to parse Excel files for the ODM 1.1.1 data loader was not without pain (and fraught with some gotchas that caused a lot of phone calls from data managers). Also, it precludes data managers who might want to create YODA files automatically from doing so (unless they want to figure out how to write directly to Excel). And, I don't think we want to write parsers for both Excel and YODA files.

To be clear: I am advocating for an Excel template that IS capable of producing a YODA file. But there may be other tools that do the same thing.

SRGDamia1 commented 9 years ago

I added macros, which improves everything dramatically.