Fix reading and writing embedded OSL code

LZaw commented 4 years ago

This PR fixes #2830 The replace_special_xml_characters function did only get called on values for XML element attributes, i.e. on attribute_value in this example: <someTag attribute_key="attribute_value"/> I changed it to be called on every part of an XML element (except the indentation) as well as on the OSL Code. Moreover, when reading the OSL Code, the parser only read up to the next escaped character, which caused the m_code to only be the last part of the OSL Code after the last escaped character. I changed this as well as the resulting formatting issues (due to trim_both).

So far, I only tested the results with the scene in the Issue, but I'd be happy to test it with more scenes, if needed.

LZaw commented 4 years ago

I changed all the things you requested and put a print-statement (not in the commit pushed) in the characters function as well as the get_code function of the reader, to test your theory.

Reading the attached file will result in split reads (split on each escaped character), as described above. The scene is the one from the respective issue, with manually escaped characters.

Reading the original (unescaped) source file results in an error: 2020-04-17T10:00:38.394202Z <002> 114 MB error | while reading <some_path>/31 - multiple source shaders.appleseed, at line 140, column 15: element name expected. That's expected behaviour, since the original file still has < symbols in the shader source.

My conclusion: We need to add the += to the construction of the m_code as well as to add a propper way to trim the code where it needs to be trimmed.

Edit: I admit it would be better if the whole block would be read as one part, but I think the part which is responsible for the split is in a dependency package.

Edit 2: I wouldn't be surprised if this behaviour would also apply to all the other escaped characters. I'll investigate.

escaped_multiple_shaders.zip

dictoon commented 4 years ago

Thanks for looking into this problem in details, I really appreciate.

You are right: according to Xerces-C++'s documentation, characters() will be called for chunks of text, not necessarily the entire text at once:

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.

So we must take care of concatenation ourselves.

I've tried your code and it appears to work correctly in the limited tests I've done, however the logic is still fragile and error-prone. It turns out that we can do a lot simpler:

class OSLCodeElementHandler
  : public ElementHandlerBaseType
{
  public:
    explicit OSLCodeElementHandler(ParseContext& context)
    {
    }

    void characters(
        const XMLCh* const  chars,
        const XMLSize_t     length) override
    {
        m_code += transcode(chars);
    }

    std::string get_code() const
    {
        return trim_both(m_code);
    }

  private:
    std::string m_code;
};

I'll push this to this PR and merge it.

Thanks again!

appleseedhq / appleseed

Fix reading and writing embedded OSL code #2850