Intensely good tabular data processing

langston-barrett commented 9 years ago

Distributive deals with a lot of tabular data. With methods like strIn, strContainedIn, reIn, commandColumnNoHeader, it's clear that this could be abstracted even further. My proposal is as follows:

We need one, unified method for splitting a tabular string into a 2D slice. It should detect which regexp to use, judging by the consistency of the row widths (it would assume even rows). It might do this using standard deviation or something. It would abstract further and eliminate the need for separateString, stringToSlice, and stringToSliceMultispace. Possible regexp's: for rows: "\n+", for columns: "\\s{2,}", "\\s+", "\t+".

In conjunction, we need a method that fetches the a column (sans header) by the header title. This should be super simple and will totally prettify the code.

For organization, this will all go into another go package: tabular.go

langston-barrett commented 9 years ago

Ideas for an algorithm for splitting arbitrary data into a table: Try different regexps, counting the length of each row. If there is one with all rows the same length, use that. If not, toss one outlier and test again. If there still isn't, just pick the one with the lowest standard deviation.

Keep in mind that this algorithm must, at its heart, be probabalistic. Which might be an issue on commands with widely varying outputs.

langston-barrett commented 9 years ago

Done with both, all that's left is to implement their use in all possible places.

CiscoCloud / distributive

Intensely good tabular data processing #34