Closed pkerichang closed 4 months ago
How are you serializing/deserializing your type T
?
I am serializing/deserializing with the exact code I posted. It is a template method, then I call it with T=int, double, chat, etc , with just the input strings containing the corresponding types
Thanks for reporting. I was able to reproduce with the following test code:
template<class T>
void test442(csubstr input, csubstr expected)
{
// as a seq member
{
const std::string input_str = formatrs<std::string>("- {}", input);
const std::string expected_str = formatrs<std::string>("- {}", expected);
const Tree tree = parse_in_arena(to_csubstr(input_str));
T obj; // T is a scalar type like int, char, double, etc.
tree[0] >> obj;
EXPECT_EQ(expected_str, emitrs_yaml<std::string>(tree));
Tree out_tree;
out_tree.rootref() |= SEQ;
out_tree[0] << obj;
EXPECT_EQ(expected_str, emitrs_yaml<std::string>(out_tree));
}
// as a map member
{
const std::string input_str = formatrs<std::string>("val: {}", input);
const std::string expected_str = formatrs<std::string>("val: {}", expected);
const Tree tree = parse_in_arena(to_csubstr(input_str));
T obj; // T is a scalar type like int, char, double, etc.
tree["val"] >> obj;
EXPECT_EQ(expected_str, emitrs_yaml<std::string>(tree));
Tree out_tree;
out_tree.rootref() |= MAP;
out_tree["val"] << obj;
EXPECT_EQ(expected_str, emitrs_yaml<std::string>(out_tree));
}
// as a doc scalar
{
const std::string expected_str(expected.str, expected.len);
const Tree tree = parse_in_arena(input);
T obj; // T is a scalar type like int, char, double, etc.
tree.rootref() >> obj;
EXPECT_EQ(expected_str, emitrs_yaml<std::string>(tree));
Tree out_tree;
out_tree.rootref() << obj;
EXPECT_EQ(expected_str, emitrs_yaml<std::string>(out_tree));
}
}
TEST(serialize, issue442_00)
{
test442<int>("123", "123\n");
}
TEST(serialize, issue442_01)
{
test442<int>("-123", "-123\n");
}
TEST(serialize, issue442_02)
{
test442<int>("+123", "123\n");
}
TEST(serialize, issue442_10)
{
test442<float>("2.35e-10", "2.35e10\n");
}
TEST(serialize, issue442_11)
{
test442<float>("-2.35e-10", "-2.35e-10\n");
}
TEST(serialize, issue442_12)
{
test442<float>("+2.35e-10", "2.35e10\n");
}
TEST(serialize, issue442_20)
{
test442<double>("2.35e-10", "2.35e10\n");
}
TEST(serialize, issue442_21)
{
test442<double>("-2.35e-10", "-2.35e-10\n");
}
TEST(serialize, issue442_22)
{
test442<double>("+2.35e-10", "2.35e10\n");
}
TEST(serialize, issue442_30)
{
test442<char>("'a'", "'a'\n");
}
TEST(serialize, issue442_31)
{
test442<char>("' '", "' '\n");
}
TEST(serialize, issue442_40)
{
test442<char>("\"a\"", "\"a\"\n");
}
TEST(serialize, issue442_41)
{
test442<char>("\" \"", "\" \"\n");
}
enable overflow/underflow detection.
The good news is that the c4 library contains facilities to detect overflow; see c4::overflows<T>(csubstr)
in the header charconv.hpp
. The bad news is that this is explicit opt-in behavior; this is done this way because overflow detection has a noticeable deserialization cost. Do something like this:
T obj;
if(ryml::overflows<T>(node.val()))
ERROR("overflow detected");
else
node >> obj;
Note that overflows<T>()
requires that T
is an integral fundamental type; it is SFINAE'd for signed vs unsigned types. There is no overflow detection for float or double; for this, you will have to use other deserialization facilities.
for numbers, don't add any quotes.
This is the intended behavior. Somehow, the existing tests did not catch this problem.
If it adds quotes around scalars, at least be consistent (either always use single quotes or double quotes).
Adding quotes is not intended behavior. The intended behavior is the following:
In short adding quotes to parsed plain scalars is definitely a bug. Doesn't matter if they are numbers or alphabetical sequences.
Ideally, support parsing the + sign in front of ints or doubles
This is not there at the moment, and I agree it should be there.
Hold on while I investigate and fix the problem.
I overlooked something in your report. In your code, where you do
// serialization
ryml::Tree out_tree;
out_tree.rootref() << obj;
... you are not setting the style flag. So the resulting node will have no style, and will be subjected to the emitter style heuristics, as an instance of a tree created programatically, like I detailed above. For example, if we do
out_tree.rootref().set_val_style(tree.rootref().type() & VAL_STYLE); // copy the style of the original parsed tree
or, if that's not possible,
out_tree.rootref().set_val_style(VAL_PLAIN); // assign explicit style
then all the tests now pass, and the only remaining problems are
+
sign cannot be parsed by the c4::atoi()
implementation-
signs fail because they will quote the scalar even when it is a number.There was indeed a problem with the emitter heuristics:
modified src/c4/yml/node_type.cpp
@@ -153,6 +153,10 @@ bool scalar_style_query_plain(csubstr s) noexcept
else if(s.sub(2).is_number())
return true;
}
+ else if(s.begins_with_any("0123456789.-+") && s.is_number())
+ {
+ return true;
+ }
return s != ':'
&& ( ! s.begins_with_any("-:?*&,'\"{}[]|>%#@`\r")) // @ and ` are
Hello,
I just found out about this library today and I was just trying things out. Overall I was able to get things working, however, there are some serialization/deserialization behaviors I would like to customize, but couldn't figure out how from the documentation.
For reference, my code is effectively doing the following:
By trying various scalar values, what I noticed is that:
-123
or-2.35e-10
, the serialized output string is always quoted with single quotes, like'-123'
and'-2.35e-10'
' '
, the output is a double quoted string like" "
+123
or+3.5e+6
, it raises an exception.I was wondering if I can tweak the parsing behavior, so that:
+
sign in front of ints or doublesIs it possible to customize the library with these changes, or do I need to use some adaptor/intermediate custom type tricks to do these customizations? If I do need to use adaptors/custom types, is there some examples that I can look at? Thanks!