adhearsion / ruby_speech

A ruby library for TTS & ASR document preparation
MIT License
101 stars 41 forks source link

embed destroys document nodes #25

Open sfgeorge opened 10 years ago

sfgeorge commented 10 years ago

When ssml_doc.embed(original_doc) is used to embed one document within another, the original document is left modified unexpectedly. This is exhibited in the following tests: https://github.com/sfgeorge/ruby_speech/compare/539d2cce...bug;destructive-embed

Here's an example using ruby_speech 2.3.1...

require 'ruby_speech'

original_doc = RubySpeech::SSML.draw do
  string "Hi, I'm Fred. The time is currently "
  say_as :interpret_as => 'date', :format => 'dmy' do
    "01/02/1960"
  end
end

puts original_doc
# => <speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">Hi, I'm Fred. The time is currently <say-as interpret-as="date" format="dmy">01/02/1960</say-as></speak>

doc2 = RubySpeech::SSML.draw do
  voice :gender => :male, :name => 'fred' do
    embed original_doc
  end
end

puts original_doc
# => <speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">Hi, I'm Fred. The time is currently </speak>
sfgeorge commented 10 years ago

Updated test cases, to zero-in on the specific problem. Also, confirmed that this is occurring on both CRuby and JRuby.

Stepping through this with a debugger, I've found that this issue occurs when embed encounters this line in nokogiri 1.6.1.

sfgeorge commented 10 years ago

Stepping through this with a debugger, I've found that this issue occurs when embed encounters this line in nokogiri 1.6.1.

...also meant to mention specifically what's happening on this line. Nokogiri is making a few direct modifications to node rather than merely "appending" it into another document:

  1. It modifies it by replacing all of its children.
  2. It modifies it by changing the namespace.

This can be seen in a debugger session on CRuby:

Breakpoint 1 at /Users/sgeorge/.rvm/gems/ruby-1.9.3-p545/gems/nokogiri-1.6.1/lib/nokogiri/xml/node.rb:948
[943, 952] in /Users/sgeorge/.rvm/gems/ruby-1.9.3-p545/gems/nokogiri-1.6.1/lib/nokogiri/xml/node.rb
   943        def inspect_attributes
   944          [:name, :namespace, :attribute_nodes, :children]
   945        end
   946
   947        def add_child_node_and_reparent_attrs node
=> 948          add_child_node node
   949          node.attribute_nodes.find_all { |a| a.name =~ /:/ }.each do |attr_node|
   950            attr_node.remove
   951            node[attr_node.name] = attr_node.value
   952          end
(rdb:1) node.children.map(&:object_id)
[70348856393920]
(rdb:1) node.namespace
#<Nokogiri::XML::Namespace:0x3ffb5e988980 href="http://www.w3.org/2001/10/synthesis">

And afterwards:

(rdb:1) n
/Users/sgeorge/.rvm/gems/ruby-1.9.3-p545/gems/nokogiri-1.6.1/lib/nokogiri/xml/node.rb:949
node.attribute_nodes.find_all { |a| a.name =~ /:/ }.each do |attr_node|

[944, 953] in /Users/sgeorge/.rvm/gems/ruby-1.9.3-p545/gems/nokogiri-1.6.1/lib/nokogiri/xml/node.rb
   944          [:name, :namespace, :attribute_nodes, :children]
   945        end
   946
   947        def add_child_node_and_reparent_attrs node
   948          add_child_node node
=> 949          node.attribute_nodes.find_all { |a| a.name =~ /:/ }.each do |attr_node|
   950            attr_node.remove
   951            node[attr_node.name] = attr_node.value
   952          end
   953        end
(rdb:1) n
(rdb:1) node.children.map(&:object_id)
[70348856600400]
(rdb:1) node.namespace
#<Nokogiri::XML::Namespace:0x3ffb5e9b7e88 prefix="default" href="http://www.w3.org/2001/10/synthesis">