biojppm / rapidyaml

Rapid YAML - a library to parse and emit YAML, and do it fast.
MIT License
584 stars 102 forks source link

Observations during our review #12

Closed dkeeney closed 4 years ago

dkeeney commented 5 years ago

I am working with the open source htm-community for numenta, working on AI software algorithms. Our software is a set of Python apps with an C++ extension library. We build packages for this library for Windows (MSVC 2017), Linux (Ubuntu), and OSx using CMake. Working with at least C++11 compilers. Our CI builds and runs unit tests for each platform. One of those also builds and tests in debug mode to make sure we don't miss any errors that this might uncover.

We use Yaml to specify algorithm parameters. We have been using yaml-cpp but it can have an issue in debug mode on ubuntu so we were looking for an alternative. The features you list and the speed measurements were exactly what we were looking for. I made a list of the issues that we found while evaluating RapidYaml.

So the conclusion is that we will have to wait until this package matures a little more and until we can be sure that when errors occur in your C code action is taken to insure that data structures are in a known state and a status code can be returned or a C++ std::exceptions can be thrown.

At this point, being thread-safe is not an issue but in the near future it may become important. I did not see any logic to insure thread safety in your code. It is also likely that we will need to have more than one Yaml tree open at the same time in one program. I did see a few globals that might cause problems in scenarios like this.

Our open-source repo is at https://github.com/htm-community/htm.core in case you are interested.

parihaaraka commented 5 years ago

joining to @dkeeney : tons of warnings like "empty expression statement has no effect; remove unnecessary ';' to silence this warning" may be fixed with macroses like this:

#define ev_idle_init(ev,cb)  do { ev_init ((ev), (cb)); ev_idle_set ((ev)); } while (0)
dkeeney commented 5 years ago

??? don't care about warnings. What we need is the ability to turn all hard errors into C++ exceptions.

biojppm commented 5 years ago

@dkeeney Thanks for the detailed review! I've been nuclear busy and somehow missed it; I've only seen it now. Will respond ASAP; thanks for the patience.

biojppm commented 5 years ago

@parihaaraka thanks for reporting. opened new issue to track that.

biojppm commented 5 years ago

@dkeeney Thanks again for the considerate review. In the coming days I will address the items individually by topic.

To avoid a sprawling thread and make it easier to track individual issues, if you feel the need to further discuss any of the topics, I invite you to open separate issues.

Right, let's do it.

biojppm commented 5 years ago

@dkeeney Addressing exceptions first.

The error behavior you have observed (with debug breakpoints and all) is just the default error callback. Note this is set up as a callback precisely so that you, the client, can control what happens when an error occurs. You can even leave the default callback out of the compilation using a macro, RYML_NO_DEFAULT_CALLBACKS; see the c4/ryml/common.hpp header and source file.

More on point, you can easily set ryml to use an error callback of your own, and throw an exception from there; that way you will get back at your original calling site (or above), where you can catch the exception thrown from your callback. You can see a working example of such an approach in the unit tests for the c4core library; later on I'll provide an equivalent one for ryml itself. Also, note that this unit test also works with VS2019.

So this addresses the use of exceptions.

Having said that, there is still an elephant in the room, and it is neither pretty nor small. Currently, ryml provides no guarantees about the correct state of its objects after an error occurs. Making sure that happens is not a trivial task, and is something likely to make the code more complicated and less efficient.

Or maybe not. If you could provide me with an example use case I will start reflecting on it.

Assuming that this is mostly a parse problem; the emitting problem is much better defined as you have full control over the types which are to be emitted. Something that I imagine can be done is returning the tree/index of the "closest" parent/sibling nodes which were parsed, together with the YAML buffer source point at which the error was detected. So only the "current" node would need to be rolled back. This would make the tree consistent up to the error point.

But the talk is different for the parser. It is a complicated state machine, and at first thought it figures as something very difficult to successfully roll back.

If you want to follow up on this issue of exceptions/consistency, let's do so on a separate issue.

biojppm commented 4 years ago

@dkeeney Do you still have an interest in this? If you do please reply; otherwise I guess I can close this.

dkeeney commented 4 years ago

you can close this item.

On Mon, Jan 20, 2020 at 12:47 PM jpmag notifications@github.com wrote:

@dkeeney https://github.com/dkeeney Do you still have an interest in this? If you do please reply; otherwise I guess I can close this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biojppm/rapidyaml/issues/12?email_source=notifications&email_token=ABIFV74PEZVHNYQJ2BGP4XTQ6YEVXA5CNFSM4IXV7BFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJNY5OY#issuecomment-576425659, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFV72MNVM74QNMJVGCROLQ6YEVXANCNFSM4IXV7BFA .