Open GoogleCodeExporter opened 9 years ago
Original comment by tfmorris
on 15 Dec 2010 at 6:36
Could you not apply multiple Facets to ease exploration ? Even using a
scatterplot facet on those few numeric columns you have would probably be
useful. Try exploring more with facets and let us know if it misses the point
somewhere.
Original comment by thadguidry
on 15 Dec 2010 at 3:05
Maybe I'm not understanding, but don't multiple facets just change which rows
are visible? With my 1920x1086 resolution screen, I can see a maximum of 14
(uncollapsed) columns. Suppose I have a table with many more columns than 14,
with interesting facets containing many columns that have just blank or
otherwise uniform contents. In short, suppose I have a very badly designed
table from a traditional data modeling perspective. The proposed enhancement
is meant to allow the user to focus on interesting (varying) data in such bad
tables.
Google Refine is great for cleaning up messy data-item tables. Badly
structured tables may be less common. But, as mentioned previously, my
approach to collecting and compiling human-generated data creates "bad" tables.
See related discussion at Issue 286:
http://code.google.com/p/google-refine/issues/detail?id=286&start=100
Original comment by galbith...@galbithink.org
on 16 Dec 2010 at 3:22
Some related thoughts:
(from
http://purplemotes.net/2010/12/19/badly-structured-tables-have-a-bright-future/
See there for post with embedded links)
badly structured tables have a bright future
Which is a better, one big table, or two or more smaller tables? The
organization of the data sources, the number of smaller tables, the extent of
the relationships between the smaller tables, and economies in table processing
all affect the balance of advantage. But cheaper storage, cheaper computing
power, and fancier data tools probably favor the unified table. At the limit
of costless storage, costless processing, and tools that make huge masses of
data transparent, you can handle a component of the data as easily as you can
handle all the data. Hence in those circumstances, using one big table is the
dominant strategy.[*]
Unified tables are likely to be badly structured from a traditional data
modeling perspective. With n disjoint components, the unified table has the
form of a diagonal matrix of tables, where the diagonal elements are the
disjoint components and the off-diagonal elements are empty matrices. It's a
huge waste of space. But for the magnitudes of data that humans generate and
curate by hand, storage costs are so small as to be irrelevant. Organization,
in contrast, is always a burden to action. The simpler the organization, the
greater the possibilities for decentralized, easily initiated action.
Consider collecting data from company reports to investors. Such data appear
within text of reports, in tables embedded within text, and (sometimes) in
spreadsheet files posted with presentations. Here are some textual data from
AT&T's 3Q 2010 report:
More than 8 million postpaid integrated devices were activated in the third quarter, the most quarterly activations ever. More than 80 percent of postpaid sales were integrated devices.
These data don't have a nice, regular, tabular form. If you combine that data
with data from the accompanying spreadsheets, the resulting table isn't pretty.
It gets even more badly structured when you add human-generated data from
additional companies.
Humans typically generate idiosyncratic data presentations. More powerful data
tools allow persons to create a greater number and variety of idiosyncratic
data presentations from well-structured, well-defined datasets. One might
hope that norms of credibility evolve to encourage data presenters to release
the underlying, machine-queryable dataset along with the idiosyncratic
human-generated presentation. But you can think of many reasons why that often
won't happen.
Broadly collecting and organizing human-generated data tends to produce badly
structured tables. No two persons generate exactly the same categories and
items of data. Data persons present change over time. The result is a wide
variety of small data items and tables. Combining that data into one badly
structured table makes for more efficient querying and analysis. As painful
as this situation might be for thoughtful data modelers, badly structured
tables have a bright future.
* * * * *
[*] Of course the real world is finite. A method with marginal cost that
increases linearly with job size pushes against a finite world much sooner than
a method with constant marginal cost. The above thought experiment is meant
to offer insight, not a proof of a real-world universal law.
Original comment by galbith...@galbithink.org
on 19 Dec 2010 at 7:09
Original issue reported on code.google.com by
galbith...@galbithink.org
on 15 Dec 2010 at 6:06