ericmckean / google-refine

Automatically exported from code.google.com/p/google-refine
Other
0 stars 0 forks source link

Refine fails to import Excel 2010 XLSX file with null hyperlinks #535

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. load enclosed file
2.
3.

What is the expected output? What do you see instead?

Just the animated graphics "Inspecting selected files" 
for hours. Like to see Refine interface with options to 
do something.

What version of Google Refine are you using?
2.5-r2407

What operating system and browser are you using?
Win XP and IE8 using the Chrome browser plugin (all available updates installed)
Tried also with Win 7 and IE9 using the Chrome browser plugin

Is this problem specific to the type of browser you're using or it happens in 
all the browsers you tried?

Tried with IE8 and IE9 before. Both didn't even open the file.

Please provide any additional information below.
The file is a database extract containing 5 columns, which was exported to 
.xlsx. Only few data.

Original issue reported on code.google.com by gerhard....@zww.uni-augsburg.de on 20 Feb 2012 at 11:00

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for the report.  That file can't be handled by Apache POI (which we use 
for reading Excel files) for some reason.  If you can save it in a different 
format you should be able to work with it.  What tool (and what version) was 
used to create the file?

The error reporting is broken in this case.  The error which gets reported on 
the console is:

java.lang.IllegalStateException: A sheet hyperlink must either have a location, 
or a relationship. Found:
<xml-fragment ref="A1" tooltip="Sort on ID" display="ID" 
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"/>
    at org.apache.poi.xssf.usermodel.XSSFHyperlink.<init>(XSSFHyperlink.java:72)

Original comment by tfmorris on 20 Feb 2012 at 5:53

GoogleCodeExporter commented 9 years ago
This looks like an Apache POI bug to me and I've filed this bug: 
https://issues.apache.org/bugzilla/show_bug.cgi?id=52716

Original comment by tfmorris on 20 Feb 2012 at 6:42

GoogleCodeExporter commented 9 years ago
The file was created by dumping an mSQL database (by Hughes) to an HTML page 
using a simple table. The HTML table was dragged and dropped to Excel 2010.

Original comment by gerhard....@zww.uni-augsburg.de on 21 Feb 2012 at 6:43

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
The Apache POI team has fixed the bug and I've verified the fix in their 
current development tree, but we'll need to wait for them to release 3.8 before 
we can pull in the fix.

As a workaround, you can open the file in OpenOffice and resave it or save it 
from Excel in a different format.

Original comment by tfmorris on 8 Mar 2012 at 12:42

GoogleCodeExporter commented 9 years ago
The POI team is forecasting that the next release will be in either March or 
April (ie soon)

Original comment by tfmorris on 8 Mar 2012 at 3:09

GoogleCodeExporter commented 9 years ago
POI 3.8 has been released and incorporated into Refine.

Original comment by tfmorris on 29 Mar 2012 at 7:03

GoogleCodeExporter commented 9 years ago

Original comment by tfmorris on 18 Sep 2012 at 3:05

GoogleCodeExporter commented 9 years ago
How can I work around this , because if I save the file in different format the 
files increase in size from 62209 Kb into 929381 Kb and have big trouble in 
importing the file intro refine even if I increase the allocate memory to 4GB.

Original comment by edyy...@gmail.com on 2 Jun 2013 at 2:37

GoogleCodeExporter commented 9 years ago
The issue tracker has moved to Github.  This is kept for historical reference 
only.

https://github.com/OpenRefine/OpenRefine/issues/535

Original comment by tfmorris on 2 Jun 2013 at 5:35