falcong / pugixml

Automatically exported from code.google.com/p/pugixml
0 stars 0 forks source link

Sig segv in pugixml #51

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run pugixml under valgrind so it crash at first unaligned access

Got this from attached gdb:
Program received signal SIGSEGV, Segmentation fault.
0x00000000006d8e28 in parse (this=0x7fefe6e40, s=0xcdb54d8
"\270\376\vB\200P\303G\353\342\273A\310\377\vB\200P\303G\342|\273A",
xmldoc=0x7fefeefa0, optmsk=180) at ../Externals/pugixml/pugixml.cpp:1268
1268                                            SKIPWS(); // Eat whitespace
if no genuine PCDATA here.

The back trace is:
#0  0x00000000006d8e28 in parse (this=0x7fefe6e40, s=0xcdb54d8
"\270\376\vB\200P\303G\353\342\273A\310\377\vB\200P\303G\342|\273A",
xmldoc=0x7fefeefa0, optmsk=180) at ../Externals/pugixml/pugixml.cpp:1268
#1  0x00000000006de68c in pugi::xml_document::parse (this=0x7fefeef80,
xmlstr=0xcdb5310 "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Response",
options=180) at ../Externals/pugixml/pugixml.cpp:3036
#2  0x000000000048e831 in XML::Parser::loadWholeStream (this=0x7fefeef80)
at myfile.hpp:102

The XML parsed is:
(gdb) x/128s xmlstr
0xcdb5310:       "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Response"
0xcdb5341:       "\n  <Ip"
0xcdb5348:       "88.161.68.173"
0xcdb5356:       "/Ip>\n  <Status"
0xcdb5365:       "OK"
0xcdb5368:       "/Status>\n  <CountryCode"
0xcdb5380:       "FR"
0xcdb5383:       "/CountryCode>\n  <CountryName"
0xcdb53a0:       "France"
0xcdb53a7:       "/CountryName>\n  <RegionCode"
0xcdb53c3:       "B8"
0xcdb53c6:       "/RegionCode>\n  <RegionName"
0xcdb53e1:       "Provence-Alpes-Cote d'Azur"
0xcdb53fc:       "/RegionName>\n  <City"
0xcdb5411:       "Marseille"
0xcdb541b:       "/City>\n  <ZipPostalCode"
0xcdb5433:       "</ZipPostalCode>\n  <Latitude"
0xcdb5450:       "43.3"
0xcdb5455:       "/Latitude>\n  <Longitude"
0xcdb546d:       "5.4"
0xcdb5471:       "/Longitude>\n  <Timezone"
0xcdb5489:       "0"
0xcdb548b:       "/Timezone>\n  <Gmtoffset"
0xcdb54a3:       "0"
0xcdb54a5:       "/Gmtoffset>\n  <Dstoffset"
0xcdb54be:       "0"
0xcdb54c0:      
"/Dstoffset>\n</Response>\n\270\376\vB\200P\303G\353\342\273A\310\377\vB\200P\30
3G\342|\273A"

What is the expected output? What do you see instead?
Not crashing (or at least no trigerring valgrind memory barrier)

What version of the product are you using? On what operating system?
Atom in AMD64 gcc 4.4.3 debian squeeze

Please provide any additional information below.

Original issue reported on code.google.com by xryl...@gmail.com on 15 Apr 2010 at 4:45

GoogleCodeExporter commented 9 years ago
Please, provide the following additional information:

1. the original XML text (i.e. the string you pass to parse() function, before 
you 
call it since pugixml modifies it by inserting line terminators, etc.). File 
attachment is preferred.

2. does it crash without valgrind?

Original comment by arseny.k...@gmail.com on 15 Apr 2010 at 6:30

GoogleCodeExporter commented 9 years ago
Is it possible that you're passing a string to parse() that's not 
zero-terminated? If 
you are, then this is a wrong usage - currently pugixml requires 
zero-terminated 
contents for parsing. It will append a zero to file contents automatically if 
you're 
using load_file or load(istream&) functions, but for parse() or load(const 
char*) 
you'll have to zero-terminate this yourself.

Original comment by arseny.k...@gmail.com on 15 Apr 2010 at 6:32

GoogleCodeExporter commented 9 years ago
Sorry, I've missed your answer.
The string was zero terminated as it's coming from the classic string library 
(I'll
check again but I'm almost 99% sure about this)
The original text comes from Yahoo weather XML api.

It doesn't crash without valgrind (but at the same time, it's Valgrind expected
behaviour).
I think it reads off-by-one, and this crashes under valgrind because it put the
allocation on the end of an allocated page (so any access at size + 1 crash).

I'll try to save the original stream when this happens.
I'm using "valgrind --leak-check=full --track-origins=yes --db-attach=y 
./software".

Original comment by xryl...@gmail.com on 22 Apr 2010 at 4:48

GoogleCodeExporter commented 9 years ago
I see. The tests for version of pugixml under development use a special 
allocator 
that aligns right end to the page end, and guards the next page; however the 
parsing 
code has changed somewhat from 0.5, so I can't be 100% sure there is no bug in 
0.5. 
All current tests also run fine under valgrind.

The location of the crash (the value of s) corresponds exactly to the symbol 
(\270) 
after \n in the last string; right where a zero terminator is expected. 
Moreover, as 
far as I understand, pugixml could not write here (since it can't read from 
here!), 
so this byte was originally \270, so I'm almost positive that the string is not 
zero-
terminated for some reason.

Still, if you manage to save the original stream if the bug occurs again, I'll 
be 
more than happy to look into it.

Original comment by arseny.k...@gmail.com on 22 Apr 2010 at 6:20

GoogleCodeExporter commented 9 years ago
I'm closing the issue since there has been no confirmation of the bug. Feel 
free to reopen it if you do have further problems.

Also note that in new version (0.9, which can be grabbed from "Downloads" 
section) loading from memory is performed via new load_buffer functions, which 
do not require a null-terminated string, but instead take buffer and size in 
bytes. They are guaranteed to work as long as the given amount of bytes can be 
read from the buffer.

Original comment by arseny.k...@gmail.com on 11 Jul 2010 at 5:22