Resulting concerto content from the feeds with some special characters (like umlauts or nonbreaking space) did contain � (http://www.fileformat.info/info/unicode/char/0fffd/index.htm) characters.
This was caused by by re-encoding UTF-8 content to UTF-8, because the ruby string was marked as ASCII-8BIT encoded
2) Turns out the xslt.serve()does also not respect content encoding at all, regardless of the input xml encoding or the xslt stylesheet output encoding, the data is always ASCII-8BIT. As a workaround, this fix now forces encoding to be the same as the incoming xml.
Maybe ruby-xlst should be replaced by nokogiri.org which looks like it should handle encoding correctly.
Resulting concerto content from the feeds with some special characters (like umlauts or nonbreaking space) did contain � (http://www.fileformat.info/info/unicode/char/0fffd/index.htm) characters. This was caused by by re-encoding UTF-8 content to UTF-8, because the ruby string was marked as ASCII-8BIT encoded
1) net/http itself does not handle content type encoding (quite a surprise), so before this fix, it always returned a string with ASCII-8BIT even if the feed content was actually UTF-8. see https://bugs.ruby-lang.org/issues/2567 Luckily open-uri, a wrapper around net/http does handle the content type encoding correctly (https://github.com/ruby/ruby/blob/trunk/lib/open-uri.rb#L438-454)
2) Turns out the
xslt.serve()
does also not respect content encoding at all, regardless of the input xml encoding or the xslt stylesheet output encoding, the data is always ASCII-8BIT. As a workaround, this fix now forces encoding to be the same as the incoming xml. Maybe ruby-xlst should be replaced by nokogiri.org which looks like it should handle encoding correctly.