biojppm / rapidyaml

Rapid YAML - a library to parse and emit YAML, and do it fast.
MIT License
583 stars 100 forks source link

What is the best way to read files from ifstream in terms of performance. #427

Closed zhoupengwei closed 6 months ago

zhoupengwei commented 6 months ago

Hi author, I am using rapidyaml to read and write content from yaml files. For reading the content from file based on the ifstream, as follows:

// open the ifstream
std::ifstream file;
// convert to string
std::string file_content((std::istreambuf_iterator<char>(file)),
                           std::istreambuf_iterator<char>());
// convert to ryml
ryml::Tree tree = ryml::parse_in_arena(ryml::to_csubstr(file_content));

This method involves additional copies. For writing the content into file based on the ostream, as follows:

std::ofstream file;
ryml::Tree tree;
file << tree;

This way uses the operator overloaded by ryml, It is very convenient!

But, ryml can not use the operator overloaded to read file content from ifstream, like the output

std::ifstream file;
// error
file >> tree;

So how to construct a tree from ifstream to ensure the best performance and avoid copying.

biojppm commented 6 months ago

Well, for best performance you should not use iostreams, period. They have generally terrible performance.

If you must use them, however, then your first method is very close: create a string from the stream and then read the string. But you can use an existing tree! Do this instead:

// file_content as above

// no copies:
ryml::Tree tree;

// minimize resizes during parsing:
tree.reserve(suitable_number_of_nodes);
tree.reserve_arena(suitable_arena_size);

// now parse into the existing tree, and save one copy by parsing in place"
ryml::parse_in_place(ryml::to_substr(file_content), &tree);
zhoupengwei commented 6 months ago

@biojppm Thank you for your reply, what is the best way to save and read rapidyaml into regular yaml file?

biojppm commented 6 months ago

For parsing, ryml reads only from existing memory buffers. So on this subject your question is how to read a file into memory. That question is outside the scope of ryml. Maybe have a look at https://stackoverflow.com/questions/2912520/read-file-contents-into-a-string-in-c

As for emitting, ryml lets you emit YAML to memory, file or iostream. Refer to the emit documentation to find out the appropriate call.

zhoupengwei commented 6 months ago

@biojppm Thank you for the detail response, my problem have been solved.