dr-jb / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

simple, robust triple transform #286

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago

Being able to transform tables between their ordinary form and a row 
number/field/item triple form in a simple and robust way would be great for 
doing table remodeling.  Here's an example application: 

1. Open in GR this Google Docs spreadsheet
https://spreadsheets.google.com/pub?key=0AnZb5H7tDMvTdEx2eExqc0NRaURjT2djSDdLanZ
OWmc&hl=en&gid=0
2. The spreadsheet opens fine, but column numbers are appended to the field 
names. Eh, minor problem; I'm sure someone will fix that eventually.  
3. What about GR operations to clean up this field-name (meta-data) problem?  
Here's a try: transform the table into row number/field/item format. Then 
fields can be edited as cell items.  One GR operation will split out the 
appended column number from all the field names.  But the reverse transform 
doesn't automatically populate the field names.  Darn!

The reverse triple transform can easily get quite complicated with more ad hoc 
changes in the field names.  But it doesn't seem like it would be difficult 
(for good programmers!) to build into GR a robust reverse triple transform.  
Some further thoughts and details here:
http://purplemotes.net/2010/12/13/exploring-and-remodeling-table-fields/

For a more general source of meta-messy wide, sparse tables, consider this 
strategy for collecting human-generated data:
http://purplemotes.net/2010/12/13/describing-and-organizing-spreadsheet-data/

Original issue reported on code.google.com by galbith...@galbithink.org on 15 Dec 2010 at 5:54

GoogleCodeExporter commented 8 years ago
re: #2 The only reason it's like that is because I coded it to match the other 
spreadsheet importer I was using as a template.  I didn't understand why it was 
that way, but it seemed better to have them all consistent.

I'd be happy to have them all drop the column numbers unless someone comes up 
with a good reason for them to be the way they are.

Original comment by tfmorris on 15 Dec 2010 at 6:34

GoogleCodeExporter commented 8 years ago
re: #2 - I'd be for dropping the column numbers as well. Note that the excel 
importer keeps each column name as-is unless it's a duplicate name, in which 
case it gets an index number starting from 2 appended to it.

Original comment by dfhu...@gmail.com on 20 Dec 2010 at 12:35