ChengjunWu / google-refine

Automatically exported from code.google.com/p/google-refine
Other
0 stars 0 forks source link

with large datasets, moving any column(2, 3, 4, etc.) to position #1 causes irrecoverable crash and burn(no undo, etc.) #562

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1. Large datasets
2. Move a column from position 2 or greater to position 1(to the right of the 
All column)
3. Data corrupts something serious, and is unrecoverable.

What is the expected output? 
The expected output should be a simple move of a column from one position to 
another, which works for 'any' position except the first column position.

What do you see instead?
See attached images.  My record count went from 100,000 down to 14596, and only 
one column is displaying.  I cannot 'Undo' because it keeps producing the error 
seen in the attached screenshot.

What version of Google Refine are you using?
2.5 

What operating system and browser are you using?
OS X Chrome 17.0.x

Is this problem specific to the type of browser you're using or it happens
in all the browsers you tried?
Happens in any browser.  Tis a nasty bug.

Please provide any additional information below.

Thanks,

Eric

Original issue reported on code.google.com by ericjarv...@gmail.com on 1 May 2012 at 5:48

Attachments:

GoogleCodeExporter commented 8 years ago
Hi Eric.  The reduction in record count is intentional.  Google Refine 
considers "indented" rows (ie empty leading cells) to be part of the "record" 
they're indented under.  This can be useful for certain types of processing, 
but can definitely cause unexpected effects if you don't know it's happening.

Side effects from this may be contributing to the other problems.  Refine (or 
browsers, depending on your point of view) isn't very good about dealing with 
records which have large numbers of rows because the default paging is by # of 
records, so 10 or 25 records could have hundreds or thousands of rows, 
depending on the organization of the data.

Not sure about the single column display, but it's almost certainly related.  
You could try switching to the row display mode (instead of the default record 
display mode) to see if it shows the other columns.

The error dialog that you're getting is strange.  That happens when you attempt 
to undo?  Any chance you are running out of memory on your computer?  Does the 
same thing happen when you restart Refine and/or your computer?

Your project is almost certainly recoverable.  If you'd like one of us to take 
a look at it, attach it here and we'll see what we can do.

Original comment by tfmorris on 1 May 2012 at 8:43

GoogleCodeExporter commented 8 years ago
Tom,

I understand moving a column to the first position can be used for created rows 
within records, but what I am referring to is that Refine does not handle 
things gracefully should one accidentally place a column in that first slot, 
for example.  It causes corruption that cannot be recovered, because the 'Undo' 
becomes non-functional, and because the Columns start disappearing when you 
attempt to perform the undo process, there is no way to get back to normal, and 
no way to output/export.  Switching between Row/Record has no effect/benefit 
when the above mentioned occurs.  So there is no way for me to output/save the 
project to send it you.

I have a top of the line Apple towerwith 64GB of RAM, fast RAID drives, etc., 
so it is not hardware performance related.

Original comment by ericjarv...@gmail.com on 1 May 2012 at 8:56

GoogleCodeExporter commented 8 years ago
FYI, The "switching" from Records back to Rows...might take quite a while, 
depending on how much data Refine has to churn through... I have seen it take 
over 10 mins in a few of my datasets...but it did eventually return back to 
Rows mode.

Original comment by thadguidry on 1 May 2012 at 10:40

GoogleCodeExporter commented 8 years ago
OK, I thought you were describing three problems.  Sounds like it's just two 
(probably related) problems.

Here are a couple of other things to try:

- Start Refine from a terminal window.  Retry the steps that produce the error 
and report any errors logged on the terminal

- Use the following steps to create a project export:

  - Click "Browse workspace directory" at bottom of main Refine screen
    (it's ~/Library/Application Support/Google/Refine/ on Mac)

  - Hover over desired project or right-click and use "copy link" to get project #

  - In a terminal window, do the following:

      cd <refine workspace directory>
      cd <project #>.project
      tar cvzf /tmp/<project name>.google-refine.tar.gz *

  For example, the steps for one of my projects were:
      cd ~/.local/share/google/refine
      cd cd 1978234000731.project
      tar cvzf /tmp/myproject.google-refine.tar.gz *

- Attach the <name>.google-refine.tar.gz and we can see if it's recoverable.

Original comment by tfmorris on 2 May 2012 at 12:37