desperado1992 / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

cross() failing #432

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
* What steps will reproduce the problem?

Calling cross() on a cell to a column in another project that has been recently 
worked on.

* What is the expected output? What do you see instead?

Expected output is joined data, instead I get empty arrays.

* What version of Google Refine are you using?

2.1

* What operating system and browser are you using?

OS X 10.5.8, Java 1.5.0_30

* Is this problem specific to the type of browser you're using or it happens in 
all the browsers you tried?

Same problem with current Chrome or Firefox.

* Please provide any additional information below.

I'm having the same problem described in comment #3 on issue #96 
(http://code.google.com/p/google-refine/issues/detail?id=96#c3). It seemed like 
it was going to get lost there so I've made it into a new one. In my case, I've 
been able to move on by dumping projects to CSV files and making fresh copies 
that have never had cross() called on them.

Original issue reported on code.google.com by sean.gil...@gmail.com on 11 Aug 2011 at 8:25

GoogleCodeExporter commented 8 years ago
I neglected to report that I'm trying to get data from a second cross against a 
second project. I'm attaching the project history where the success of one 
cross can be seen early on. The last operation is the failing one.

Does Refine only support one cross per project, or only one other project in 
crosses?

Original comment by sean.gil...@gmail.com on 11 Aug 2011 at 8:34

Attachments:

GoogleCodeExporter commented 8 years ago
cross() does indeed work in 2.1 (verified).

David will have to answer on whether it can support more than 1 cross 
expression.
I personally have run into the problem for some reason in that the expression 
editor window parsing kinda bugs out with cross() expressions.  I cannot simply 
copy and paste into the editor, instead, I have to manually type them out, and 
IT WORKS !  Weird.  Give it a shot and let us know.

Original comment by thadguidry on 26 Aug 2011 at 3:46

GoogleCodeExporter commented 8 years ago
I stumbled onto a anecdote in the mailing list about a limit of ~6 crosses per 
project, but am unable to find that post right now.

Original comment by sean.gil...@gmail.com on 26 Aug 2011 at 4:11

GoogleCodeExporter commented 8 years ago
Yes, I was the one who did those 6 crosses one time, but I am not sure if that 
can still work now in release 2.1 or /trunk. At the time, David was even 
surprised I had strung that many all together!

Original comment by thadguidry on 26 Aug 2011 at 4:20

GoogleCodeExporter commented 8 years ago
You were definitely right to create a dedicated bug report for this.  I guess 
that other random commenter wasn't really interested in seeing the problem 
fixed.

Do you see the time-dependent behavior that was reported in the other comment?

One thing which could be a potential issue is that projects are cached in 
memory and only written out periodically (every 5 minutes, I think).  If you've 
got data in memory which isn't represented on disk and the cross() is working 
off disk, it could get confused -- but that's entirely speculation on my part 
without actually looking at the code.

Original comment by tfmorris on 26 Aug 2011 at 4:25

GoogleCodeExporter commented 8 years ago
Hello all,

I can also confirm that Version 2.1 [r2136] stops doing cross's after a few of 
them (3 in my case). I've tried rebooting; killing the process and waiting; 
exporting to csv and creating a new project and it still refuses to work.

Any new suggestions or fixes?

Original comment by pol...@multipagos.com.mx on 7 Oct 2011 at 8:59

GoogleCodeExporter commented 8 years ago
re: Comment 6

Do you see any errors on the console/terminal that the Refine server was 
started from?  Have you tried exporting and recreating both projects?  Can you 
make the projects available for debugging?

Original comment by tfmorris on 7 Oct 2011 at 11:18

GoogleCodeExporter commented 8 years ago
Did you change the file your crossing with? Ex:

Fila A i perform a cross op with File B applying this json AND after that, you 
changed your File B COLUMN B and make the join again and does not work. 

[  {
    "op": "core/column-addition",
    "description": "Create column cross_result at index 1 based on column operacion using expression grel:cell.cross(\"FILE B\", \"COLUMN B\").cells[\"COLUMN A\"].value[0]",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "newColumnName": "cross_result",
    "columnInsertIndex": 1,
    "baseColumnName": "operacion",
    "expression": "grel:cell.cross(\"FILE B\", \"COLUMN B\").cells[\"COLUMN A\"].value[0]",
    "onError": "set-to-blank"
  }
]

Original comment by j...@tekii.com.ar on 4 Nov 2011 at 8:23

GoogleCodeExporter commented 8 years ago
I've also noticed inconsistencies in the cross function. I've found that it 
works when matching against some columns, but returns an empty array for 
others. The failing columns were numeric data, the successful column (in this 
instance) was text. I converted the numeric columns to text (value.toString()), 
but this did not get around the problem.

Original comment by craig55...@gmail.com on 21 Jun 2012 at 12:13

GoogleCodeExporter commented 8 years ago
I can confirm this problem for Version 2.5 [r2407]. I have two separate tables 
and wanted to join them. In the preview I can see the right results but as soon 
as I run the command (which takes a relatively long time for about 16 rows). 

I tried to join two lists of Harry Potter professors and their subjects from 
http://en.wikipedia.org/wiki/List_of_fictional_professors but ended up joining 
them in Excel instead. Let me know if you need more details. 

Original comment by domoritz on 11 Aug 2012 at 2:21

GoogleCodeExporter commented 8 years ago
OK, I'm pretty sure I know what the problem is here.  It happened to me, so I 
actually had a test case that I could debug with. :-)

Joins are cached to save on computational expense.  Unfortunately, although we 
have cache flush methods implemented, they're never called.  This basically 
means that if you do a cross once, the join for that pair of columns is frozen 
for all time (well, until you restart Refine anyway).  Note that the obvious 
case is that you notice it not working, but it can also fail in a much more 
subtle way if you update the values in the source or target column and attempt 
to redo the cross(), you'll get results based on the original values.

@craig552uk - I'm pretty sure that cross needs the values to match exactly, 
including type. ie string('123')!=number(123) This is slightly counter 
intuitive since the vast majority of things in Refine work on a string 
equivalence basis.   Unfortunately, due to the caching bug, once it failed 
once, it was going to work until you restarted Refine.

There have been conflicting reports about what's needed to clear the condition, 
but I'm pretty sure that a) no amount of time will clear the condition and b) 
restarting the Refine server should clear the condition immediately.

Original comment by tfmorris on 25 Aug 2012 at 8:11

GoogleCodeExporter commented 8 years ago
Fixed in r2539.  Hopefully I found all the places that the cache needs to be 
flushed.

Original comment by tfmorris on 30 Aug 2012 at 4:36

GoogleCodeExporter commented 8 years ago

Original comment by tfmorris on 18 Sep 2012 at 3:05