Closed stohrendorf closed 4 months ago
I can't reproduce at all, and the YAML you provided is parsing correctly in my linux machine. Your report is confusing, and missing fundamental information:
std::ios::binary
situation. Stick to using either std::string
or csubstr
or const char*
, containing all the characters (see example below)Please provide a minimum verifiable example, with the exact YAML source embedded as a string in C++ code. Otherwise, I can't help you.
FWIW, this is what I tried:
void test_439(csubstr yaml)
{
test_check_emit_check(yaml, [](Tree const &t){
ConstNodeRef n = t["meta"];
EXPECT_EQ(n["title"], "Croft Arena");
EXPECT_EQ(n["author"], "JaayStation, SOTOS");
EXPECT_EQ(n["downloadSoundtrack"], "false");
EXPECT_EQ(n["urls"][0].val(), "https://drive.google.com/file/d/1fJ1Z3Ktr2VlPK1MD5OB4tY7uDnqyRMEQ/view?usp=sharing");
});
}
void test_439_ostream(csubstr yaml)
{
{
std::ofstream ofile("adsasd.bin", std::ios::out | std::ios::binary);
ofile.write(yaml.str, (std::streamsize)yaml.len);
}
std::vector<char> m_buffer;
{
std::ifstream file("adsasd.bin", std::ios::in | std::ios::binary);
ASSERT_TRUE(file.is_open());
file.seekg(0, std::ios::end);
const std::streamsize size = static_cast<std::streamsize>(file.tellg());
file.seekg(0, std::ios::beg);
m_buffer.resize((size_t)size);
file.read(&m_buffer[0], size);
}
EXPECT_EQ(to_csubstr(m_buffer), yaml);
test_439(to_csubstr(m_buffer));
// try with trailing \0
m_buffer.resize(m_buffer.size() + 10);
EXPECT_TRUE(to_csubstr(m_buffer).last(10).trim('\0').empty());
std::cout << to_csubstr(m_buffer);
test_439(to_csubstr(m_buffer));
}
TEST(github, 439_0)
{
csubstr yaml = ""
"meta:\n"
" title: \"Croft Arena\"\n"
" author: \"JaayStation, SOTOS\"\n"
" downloadSoundtrack: false\n"
" urls:\n"
" - \"https://drive.google.com/file/d/1fJ1Z3Ktr2VlPK1MD5OB4tY7uDnqyRMEQ/view?usp=sharing\"\n";
test_439(yaml);
test_439_ostream(yaml);
}
TEST(github, 439_1)
{
csubstr yaml = ""
"meta:\r\n"
" title: \"Croft Arena\"\r\n"
" author: \"JaayStation, SOTOS\"\r\n"
" downloadSoundtrack: false\r\n"
" urls:\r\n"
" - \"https://drive.google.com/file/d/1fJ1Z3Ktr2VlPK1MD5OB4tY7uDnqyRMEQ/view?usp=sharing\"\r\n";
test_439(yaml);
test_439_ostream(yaml);
}
Hi,
I'm sorry for missing some information. First, m_buffer
is a std::vector<char>
. I have in the meantime updated VS and the project dependencies, so it is not the same environment anymore, which is OFC bad. It does not crash anymore now when the buffer has the correct size and no trailing zeros, so I guess that was an error with a broken build directory. However, when the buffer is technically larger than the contained string, it still crashes. Without the std::ios::binary
, the exact error message I get is:
ERROR: could not find ':' colon after key
7:7:
The buffer at this point looks like this (only the trailing bytes as well as a newline):
I would have expected an assertion somewhere that it's not parsing NUL bytes.
The source YAML, passed through xxd
, is
00000000: 6d65 7461 3a0d 0a20 2074 6974 6c65 3a20 meta:.. title:
00000010: 2243 726f 6674 2041 7265 6e61 220d 0a20 "Croft Arena"..
00000020: 2061 7574 686f 723a 2022 4a61 6179 5374 author: "JaaySt
00000030: 6174 696f 6e2c 2053 4f54 4f53 220d 0a20 ation, SOTOS"..
00000040: 2064 6f77 6e6c 6f61 6453 6f75 6e64 7472 downloadSoundtr
00000050: 6163 6b3a 2066 616c 7365 0d0a 2020 7572 ack: false.. ur
00000060: 6c73 3a0d 0a20 2020 202d 2022 6874 7470 ls:.. - "http
00000070: 733a 2f2f 6472 6976 652e 676f 6f67 6c65 s://drive.google
00000080: 2e63 6f6d 2f66 696c 652f 642f 3166 4a31 .com/file/d/1fJ1
00000090: 5a33 4b74 7232 566c 504b 314d 4435 4f42 Z3Ktr2VlPK1MD5OB
000000a0: 3474 5937 7544 6e71 7952 4d45 512f 7669 4tY7uDnqyRMEQ/vi
000000b0: 6577 3f75 7370 3d73 6861 7269 6e67 220d ew?usp=sharing".
000000c0: 0a .
You can see, as I said earlier, that there's an empty 7th line, as line 6 ends with a line break.
The stacktrace at this point is:
croftengine.exe!c4::yml::error_impl(const char * msg, unsigned __int64 length, c4::yml::Location loc, void * __formal) Zeile 56 C++
croftengine.exe!c4::yml::ParseEngine<c4::yml::EventHandlerTree>::_err<>(c4::basic_substring<char const> fmt) Zeile 455 C++
croftengine.exe!c4::yml::ParseEngine<c4::yml::EventHandlerTree>::_handle_map_block() Zeile 6050 C++
croftengine.exe!c4::yml::ParseEngine<c4::yml::EventHandlerTree>::parse_in_place_ev(c4::basic_substring<char const> filename, c4::basic_substring<char> src) Zeile 8089 C++
croftengine.exe!c4::yml::parse_in_place(c4::yml::ParseEngine<c4::yml::EventHandlerTree> * parser, c4::basic_substring<char const> filename, c4::basic_substring<char> yaml, c4::yml::Tree * t, unsigned __int64 node_id) Zeile 40 C++
croftengine.exe!c4::yml::parse_in_arena(c4::basic_substring<char const> yaml) Zeile 108 C++
> croftengine.exe!serialization::YAMLDocument<1>::YAMLDocument<1>(const std::filesystem::path & filename) Zeile 68 C++
Finally got some time to look at this again. Adding the trailing \0
characters does make the error show up.
However, \0
is not a valid character. Specifically,
To ensure readability, YAML streams use only the printable subset of the Unicode character set. The allowed character range explicitly excludes the C0 control block15 x00-x1F
So you have to ensure that the \0 character is not there, for example by doing this:
m_tree = ryml::parse_in_arena(c4::to_csubstr(m_buffer).trimr('\0'));
I'm not complaining that RYML misbehaves when I'm giving it the wrong data which does not conform to the invariants the function requires. What I was expecting was a debug-only assert somewhere that validates that it gets data conforming to these invariants (possibly even guarded by something like RYML_STRICT_DEBUG_ASSERTS
or so, so that debug performance doesn't drop unless needed). That would have also avoided this issue here as it would have let me check my own data first and fixing my code without you knowing about it, saving both of our time.
Although presumably possible, such a feature is very low on the priority list. This is because it addresses only a small corner problem, and yet it is likely not a small task, once you start to account for tests, and all the places that this would have to be looked for.
If you are interested in this, feel free to submit a PR to address the problem. But note that the assertion infrastructure is already in place.
This is the code which loads the YAML:
This is the YAML is should parse, including a newline at the end:
However, in version 0.7.0, rapidyaml throws an error that it's missing a colon after the key at line 7, column 2, which is an empty line, and beyond the size of the buffer. It works fine with 0.6.0, so I'm wondering if there's a breaking change regarding the new parser I don't fully grasp, or if it's an actual bug regarding handling of null terminators. When I remove the
std::ios::binary
, it complains about column 7, which indicates the latter, as the file has 7 lines with\r\n
terminators, and the debugger showed me that the buffer contained only plain\n
's.