biojppm / rapidyaml

Rapid YAML - a library to parse and emit YAML, and do it fast.
MIT License
568 stars 96 forks source link

Problems emitting YAML #225

Closed biojppm closed 2 years ago

biojppm commented 2 years ago

So what we use the emitter for is to load up some old CSV files to convert to YAML. After I converted all of our formatting from yaml-cpp to rapidyaml I ran into several issues with the way the CSV content is stored into NodeRef. yaml-cpp is directly assignable of any type: string, char, int, etc. When messing with rapidyaml I could only get my work to compile via c4::to_csubstr() which of course was converting to base64 and when I dumped it to the file I was getting Key: <hash>.

Here's an example of what I was playing around with:


static bool instance_readdb_sub(char* str[], int columns, int current) {
body |= ryml::MAP;
body["Id"] = c4::to_csubstr(str[0]);
body["Name"] = c4::to_csubstr(str[1]);
if (atoi(str[2]) != 3600)
body["TimeLimit"] = c4::to_csubstr(str[2]);
if (atoi(str[3]) != 300)
body["IdleTimeOut"] = c4::to_csubstr(str[3]);
ryml::NodeRef enter = body["Enter"];
enter |= ryml::MAP;

enter["Map"] = c4::to_csubstr(str[4]);
enter["X"] = c4::to_csubstr(str[5]);
enter["Y"] = c4::to_csubstr(str[6]);

if (columns > 7) {
    ryml::NodeRef map = body["AdditionalMaps"];
    map |= ryml::MAP;

    for (int i = 7; i < columns; i++) {
        if (!strlen(str[i]))
            continue;

        if (strcmpi(str[4], str[i]) == 0)
            continue;

        map[c4::to_csubstr(str[i])] = "true";
    }
}

return true;

}


> Unless I'm completely missing something from the quickstart.cpp, I couldn't find a way for us to dump our content properly. Also, is there a way to just create a NodeRef without a tree? I also had to create an empty `parse_in_place` call to generate a tree but I have no use for the tree since all I want to do is fill up a NodeRef variable and then dump that variable to a file. Thanks!

_Originally posted by @aleos89 in https://github.com/rathena/rathena/issues/5997#issuecomment-1063232855_
biojppm commented 2 years ago

@aleos89 base64 cannot have anything to do with the code you pasted above; it takes explicit base64() calls to have any base64 encoding/decoding. And I see nothing wrong with your sample as such.

Rather, I suspect you have a problem with memory lifetime issues.

(FWIW, this is the relevant quickstart code)

For example, when you do body["Id"] = c4::to_csubstr(str[0]);, the node will not copy the memory; it will only be pointing at the memory in str[0]. If str[0] is destroyed and the ryml tree lives on, then the tree will be left pointing at the stale memory from str[0]. For example:

ryml::Tree t;
auto body = t["body"]; // or something like this
{
  std::string your = "your";    // valid only in this scope
  std::string strings = "strings";   // valid only in this scope
  char *the_strings[] = {&your[0], &strings[0], ....};
  instance_readdb_sub(the_strings, columns, current);
}
std::cout << body["id"].val() << "\n";  // ERROR: stale memory

but this should be fine:

ryml::Tree t;
auto body = t["body"]; // or something like this
std::string your = "your";
std::string strings = "strings";
char *the_strings[] = {&your[0], &strings[0], ....};
instance_readdb_sub(the_strings, columns, current);
std::cout << body["id"].val() << "\n";  // OK! memory was not freed

Or alternatively you can use operator<< to force copying the string to the tree's arena, at the cost of allocating and copying.

Let's address your other questions after this issue, as it is extremely important.

aleos89 commented 2 years ago

@biojppm That makes sense. The body and tree variables are declared globally in our case, so it should be behaving like your latter sample. The strings though that are parsed from the CSV file only live within that scope though, which is where instance_readdb_sub is called.

I'll play around with << as the operator!

biojppm commented 2 years ago

Note that operator= will always be faster and less resource intensive than operator<< . The downside is if you use = you need to take care with the lifetime. Blue pill or red pill.

aleos89 commented 2 years ago

Gotcha! Well in our case again this converter is usually just ran whenever someone needs to convert their CSV to YAML. We have several CSV databases which we are slowly converting, so performance isn't a major issue here as this is a one-and-done situation.

biojppm commented 2 years ago

Did this work for you?

aleos89 commented 2 years ago

Sorry for the delay. I've been following up on some bugs in our repo recently. I will leave a comment once I get back to testing this again here shortly!

biojppm commented 2 years ago

@aleos89 closing this now. Feel free to reopen if you want to resume.