andreww / fox

A Fortran XML library
https://andreww.github.io/fox/
Other
59 stars 50 forks source link

Problem with getTextContent when using bracket characters [ or ] ? #23

Closed johnjc1 closed 11 years ago

johnjc1 commented 11 years ago

problem description:

Im trying to extract the text between the name tags. The text data seems to get mangled if bracket characters appear in the text. Im not aware that these characters are illegal?

Example input file:

<?xml version="1.0" encoding="ISO-8859-1"?>

<name>
[_tmp]:=somecommand(data, 0, 1)
</name>

output on screen:

[_tmp]:]=]s]o]m]e]c]o]m]m]a]n]d](]d]a]t]a],] ]0],] ]1])]
]      

source code:

 program dom
 use FoX_dom
 implicit none

 integer :: i
 type(Node), pointer :: doc, name
 type(NodeList), pointer :: nameList
 character(200) :: name_text

 doc => parseFile("test.xml")

 nameList => getElementsByTagname(doc, "name")

 do i = 0, getLength(nameList) - 1
   name_text = ''
   name => item(nameList,i)

   !call extractDataContent(name, name_text)
   name_text = getTextContent(name)

   print *, name_text
 enddo

 call destroy(doc)
 end program dom
andreww commented 11 years ago

This is coming out of the SAX parser: it's the "]" character causing the issue. If I had to guess this is going to turn out to be a bug in the way we look for the end of CDATA sections but it's a while since I've looked at that bit of code.

andreww commented 11 years ago

Should be fixed by 5a936d7e849cfbbc7ff197c2e280260e5483ac8f but - as the commit message says - I still need to check for similar bugs and write test cases for this.

andreww commented 11 years ago

Added the tests (using the dom interface - it's easier). By inspection of the code I cannot see other similar cases.