Closed GoogleCodeExporter closed 9 years ago
Thanks for the bug report. I can reproduce this, and will take a further look
at what's going on.
Original comment by iainsproat
on 17 Nov 2010 at 5:45
The problem is:
[XmlImportUtilities] No candidate elements were found in data - at least 6
similar elements are required (0ms)
I can count more than 6 <item> tags, so I don't think that's the issue. It
might be due to them being in a mixed element, and not getting counted
correctly.
Original comment by iainsproat
on 17 Nov 2010 at 6:26
Do you have an example of an XML file that does work?
Original comment by chr...@gmail.com
on 17 Nov 2010 at 7:07
One problem with each of these files is that there are characters before the
initial <?xml> This is causing the parser to choke immediately. We should
probably consider trimming leading whitespace, but you can get the files
imported with the existing code by deleting the initial blank line.
Original comment by tfmorris
on 27 Nov 2010 at 10:34
thanks, that worked for me
Original comment by chr...@gmail.com
on 28 Nov 2010 at 7:57
Original comment by tfmorris
on 7 Jun 2011 at 5:59
Fixed in r2246.
Original comment by tfmorris
on 14 Oct 2011 at 10:27
I'm having a similar problem, and downloaded RC2314, but that version will only
import the first record of a fairly flat XML file.
I've enclosed a sample. I selected the first <entry1> record of the file as
the first record.
Any suggestions?
Original comment by ron.ma...@gmail.com
on 17 Nov 2011 at 4:48
Attachments:
Hmm, the schema seems a bit odd. You have uniquely numbered elements such as
<entry1> <entry2> ... rather than just all of them being <entry>. I used
Refine's line-based import to pull it in and then did some text filtering and
value.partition and replacing , and then exported to give you a cleaner XML
file to work with. Attached. Does that contain all the records ? (I get 1239
of them using latest Trunk version)
Original comment by thadguidry
on 17 Nov 2011 at 5:09
Attachments:
Thanks very much!
Unfortunately, this is output from software I don't write, so I am unable to
make changes to the schema.
I'm just testing this out to see if Google Refine will let us prepare a report
from this file.
I guess I'll have to clean it up manually each time I want to import it.
Thanks again!
Original comment by ron.ma...@gmail.com
on 17 Nov 2011 at 5:20
Probably best thing is use Notepad++ or whatever text editor you have and just
use a Find/Replace using regex such as <Entry\d+> and </Entry\d+> and then
replace those with the string <Entry>. You could also create a python script
to do that as part of a batch process, or if this is a constant feed process,
perhaps use an ETL tool like Talend to pick up the files in a directory when
they arrive and convert & clean them for later analysis in Refine.
Original comment by thadguidry
on 17 Nov 2011 at 5:38
Yes, thanks! That'll save some time rather than doing it in Refine.
I appreciate this forum, everyone's so helpful!
Original comment by ron.ma...@gmail.com
on 17 Nov 2011 at 5:41
Original issue reported on code.google.com by
chr...@gmail.com
on 17 Nov 2010 at 5:09Attachments: