getdnsapi / stubby

Stubby is the name given to a mode of using getdns which enables it to act as a local DNS Privacy stub resolver (using DNS-over-TLS).
https://dnsprivacy.org/dns_privacy_daemon_-_stubby/
BSD 3-Clause "New" or "Revised" License
1.2k stars 100 forks source link

Improve YAML parser #212

Open aisbergg opened 5 years ago

aisbergg commented 5 years ago

I'm really fond of the YAML format and using it for configurations is in general a great idea. I am playing around with stubby recently and got really frustrated with the configuration, because of unreasonable errors poping up. I came to the conclusion, that the YAML parser implementation of stubby has some quirks that need to be addressed. Here some examples on strings in the configuration:

appdata_dir: "/var/cache/stubby" # --> OK
appdata_dir: /var/cache/stubby   # --> Generic error
resolution_type: GETDNS_RESOLUTION_STUB   # --> OK
resolution_type: "GETDNS_RESOLUTION_STUB" # --> A helper function was supposed to return a certain type for an item, but the wrong type was given.

In context of YAML both the examples considered to be of type string. I wanted to use a templating engine to create the config dynamically so this behaviour isn't so great. It would be great to improve the parser implementation to parse the basic data types as expected.

jonathanunderwood commented 4 years ago

See also #114

banburybill commented 4 years ago

I agree that this behaviour isn't correct YAML and is unfortunate, as it means we can't simply fix this or #114 either.

Unfortunately it's not something that's quickly fixable. YAML is currently handled by using libyaml to parse the YAML and converting the data found to the not-quite-JSON that getdns uses. The conversion is a simple rewrite of the YAML parse stream as it is parsed; we don't actually do a full semantic parse of the YAML. That means that presently when we're writing JSON out, we know the current YAML token, but not much else. For, for example, we don't know if a token is a map key or a value, or if it's a value whether it's a value in a map. And if it was a value in a map, we certainly don't know what the corresponding key is.

libyaml presents tokens to us with any quotes that were present removed, but with an indication that the quantity was quoted in the YAML. So currently we're using that indication to decide whether or not to quote the token when outputting JSON.

getdns requires in its not-quite-JSON that items it knows to be string quantities are quoted, but other quantities such as numbers, IP addresses and named constants must NOT be quoted. So we can't just blindly quote everything (I tried!).

The correct solution is to redo the YAML parsing to build a full parse tree before generating getdns-JSON. That way we'd know the context of values and hence could determine definitively whether they should be quoted or not. However, that's a fair chunk of work.

Typing this, I have thought of a work-around, which would be to rely on the libyaml token presentation order and setting a 'quote-next-value' flag depending on the value of the current token. This might be good enough, provided all the values that need to be quoted are map values where the key is only ever followed by a single value. This does feel a bit of a bodge, but in the context of Stubby config might suffice for this particular problem.

jonathanunderwood commented 4 years ago

getdns requires in its not-quite-JSON that items it knows to be string quantities are quoted, but other quantities such as numbers, IP addresses and named constants must NOT be quoted. So we can't just blindly quote everything (I tried!).

Naieve question: would it not be better to bite the bullet and fix getdns to handle and use JSON correctly? Then a full parse of the YAML, and conversion to JSON, would be straightforward and compliant with the JSON and YAML specs.

saradickinson commented 4 years ago

@jonathanunderwood As is often the case, this is a question of resources and priorities... @banburybill I've merged https://github.com/getdnsapi/stubby/pull/256 as a short term solution here but we should discuss with @wtoorop.