ericmckean / google-refine

Automatically exported from code.google.com/p/google-refine
Other
0 stars 0 forks source link

Transpose key/value only produces a single row #529

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When I attempt to transpose the key/value pairs in the attached file, I end up 
with just a single row of data rather than hundreds of rows as I expect.

Original issue reported on code.google.com by tfmorris on 9 Feb 2012 at 1:45

Attachments:

GoogleCodeExporter commented 9 years ago
I have the same issue. 
name: name1
email: email@1.com
name: name2
email: email@2.com
name: name3
email: email@3.com

ends up showing just the last data pair as a single row (with name and email 
column headings as expected):
name3  email@3.com

Original comment by iae...@gmail.com on 13 Feb 2012 at 9:06

GoogleCodeExporter commented 9 years ago
We could do a better job of this out of the box, but there is a way to work 
around the issue.

1. Facet on something which marks the first row of each "record" (e.g. %0 in my 
example or "name" in the iaeonh's example)
2. Add column using the expression: row.index
3. Move the resulting column to the beginning
4. Fill down on the column so that all rows in the "record" have the same index 
value.
5. Finally, do the key/value transpose and you'll get the desired result.

Original comment by tfmorris on 18 Sep 2012 at 6:45

GoogleCodeExporter commented 9 years ago
p.s. As a breadcrumb for anyone specifically searching for how to do this, my 
example file is an EndNote citation file.

Original comment by tfmorris on 18 Sep 2012 at 6:51

GoogleCodeExporter commented 9 years ago
Noting the feature name to find this issue easier: Columnize by key/value 
columns

Original comment by thadguidry on 3 Oct 2012 at 3:16

GoogleCodeExporter commented 9 years ago
Attached another use case for this issue to try and resolve it.

The Note field could be choosen with the RECORDNUM column, but that causes only 
the last record row to be output.  Not choosing the Note Field and only 
selecting the Key and Value columns will only extract and columnize against the 
first Key value, in this case the "Title" values, but other values in the outer 
row processing FOR loop sequence do seem to be populated.

Conversely, removing the RECORDNUM column and starting a Columnize by key/value 
columns on only 2 columns, the Key and Value columns, will only return 1 row, 
the last record row with all Key fields populated correctly.

Original comment by thadguidry on 3 Oct 2012 at 4:17

Attachments:

GoogleCodeExporter commented 9 years ago
My workaround (or the last piece of it anyway) should work for Thad's example.

Fill Down on the RECORDNUM column, then transpose on Key + Value.  I get 86 
records with reasonable looking columns.

Original comment by tfmorris on 3 Oct 2012 at 8:47

GoogleCodeExporter commented 9 years ago
Fixed in r2574.  I also added support for proper handling of repeated fields 
e.g. an EndNote record with multiple author keys.  This only works when there 
are key, value, and optionally note columns.  When there are more columns the 
previous/existing algorithm uses a key based on the values of all the remaining 
cell values and assumes they they all represent a single record.  This allows 
the k/v records to be non-continuous but has the downside that it doesn't allow 
multiple records with the same non k/v values.

Original comment by tfmorris on 5 Oct 2012 at 11:39