dr-jb / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

Reload Data or Load New Data #460

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?

# Load some data.
# Do some manipulations
# Re-apply those manipulations to a new dataset or new version of a dataset.

Currently, it's not possible to re-apply a set of manipulations without 
exporting, messing with the export file, and re-importing. It would be 
extremely useful to provide repeatable transformation capabilities. Without 
these, the use of Refine for scientific (repeatable) applications is extremely 
limited.

Original issue reported on code.google.com by mccus...@gmail.com on 12 Oct 2011 at 3:59

GoogleCodeExporter commented 8 years ago
There is a feature in Undo/Redo that you can use to Export operations to a JSON 
text file and then paste them in for another dataset and Apply them.  This is 
shown in the tutorial videos.  Does that suit your needs ? or is your request 
something more ?

Original comment by thadguidry on 12 Oct 2011 at 1:27

GoogleCodeExporter commented 8 years ago
It would be useful to understand the "messing" that you're trying to avoid 
and/or the UI flow for what you're proposing.  Pretend we have no clue what 
context you're operating in or what your assumptions are and make it nice and 
simple.

Original comment by tfmorris on 12 Oct 2011 at 8:48

GoogleCodeExporter commented 8 years ago
Exporting and re-importing operations sounds promising, but a "re-import" 
command might be clearer to users.

Original comment by mccus...@gmail.com on 13 Oct 2011 at 7:49

GoogleCodeExporter commented 8 years ago
I have the same issue. Let me summarize my workflow:

# Manipulate some data:

- Load some data into project "foo"
- Do some work
- Load some data into project "bar"
- Do some work
- Load some data into project "fobar"
- Do some work

Now I want to use the 'cross()' function to do calculations on "foobar" 
relative to "foo" and "bar", as explained in
http://code.google.com/p/google-refine/wiki/GRELOtherFunctions

This is a very powerful tool for data manipulation, and it works great.

# Problem situation:

I now want to redo the process for a fresh dataset with the exact same data 
layout. If I export the JSON and apply to new projects, I am left with new 
project names "foo1", "bar1" and "fobar1". As the 'cross()' function, and 
presumably other functions too, depend on the name of the referenced projects, 
and hence it does not work well with the new names. It does even not play well 
with looking up cell contents from within the same project, as there is no 
parameter "project.name" available either.

# Currently available solution

The solution available to me at present is this:

- Load new datasets into new projects with new names
- Extract JSON history from old projects
- Replace all references to "foo", "bar" and "foobar" with "foo1", "bar1" and 
"foobar1" in the JSON histories
    - **This is error prone**
- Replay the JSON histories on the new projects

While I can cope with this, being a wizard with regexp and understanding 
programming syntaxes quite well, it is not very handy, and is quite time 
consuming.

# Proposed solution

A simple "Reload data and replay all operations" function would solve this in a 
snap.

;)Frode

Original comment by frodesev...@gmail.com on 26 Sep 2012 at 6:53