dr-jb / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

HtmlText GREL function locks up Google Refine #623

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Hello,

In running attached project, I am trying to parse column HTML with following 
GREL:

value.parseHtml().select("body")[0].select("table")[7].select("span")[1].htmlTex
t()

Which works OK.

However when I try following:

value.parseHtml().select("body")[0].select("table")[7].select("span")[2].htmlTex
t()

Refine hangs without any apparent error.

*****
If you don't know how to do something in Google Refine, please ask on the
mailing list. Also use the mailing list for discussions and comments.

http://groups.google.com/group/google-refine/

Only file an issue here if you discover a bug or want to request a new
feature.

Thank you.
*****

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?

What version of Google Refine are you using?

What operating system and browser are you using?

Is this problem specific to the type of browser you're using or it happens
in all the browsers you tried?

Please provide any additional information below.

Original issue reported on code.google.com by patelm...@gmail.com on 26 Sep 2012 at 11:36

Attachments:

GoogleCodeExporter commented 8 years ago
Seems to work just fine for me in Google Chrome latest on WinXP 32bit using 
latest trunk version r2569.

I created a new NameBranchMerge column for you to continue with in the attached 
project.

Does your terminal or console report any Java errors of any kind when 
performing that GREL expression ?  What browser version are you using ?

Original comment by thadguidry on 27 Sep 2012 at 1:23

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks for providing the data to reproduce the problem.  That's a big help.  
Don't forget to provide versions for Refine and your browser too.

I can confirm that the problem exists in Refine 2.5 and is fixed in the current 
development sources, I'm going to mark this as fixed.  Please retest when the 
2.6 beta is released.

If I had to guess, I'd say the problem has something to do with escaping of 
HTML entities or tags (it as embedded <br/> tags, but since it seems to work 
now, I'm not going to spend time figuring it out.

Original comment by tfmorris on 27 Sep 2012 at 8:03