bottom-software-foundation / spec

A spec for the bottom encoding format.
MIT License
100 stars 1 forks source link

Null byte is ambiguous #2

Closed nihaals closed 3 years ago

nihaals commented 3 years ago

The spec doesn't make it clear if there should be a byte separator after U+2764 U+FE0F. bottom-web encodes to no byte separator but then decoding it will error. The pseudocode included in the spec suggests that a byte separator shouldn't be included (which makes sense as it is not required to be able to decode accurately), but bottom implementations seem to handle this inconsistently.

For having byte separator Against having byte separator
EBNF Pseudocode
Decode in reference implementation Encode in reference implementation (bottom-rs)
Breaks existing encoded data
romdotdog commented 3 years ago

I think the spec should be rewritten because it's very vague in topics such as backwards compatibility, exact decoding algorithm and encoding algorithm (without using hashmaps), whether to verify that there's a byte separator at the end of the sample, etc.

I also think a standardized testing JSON or similar should be implemented and a test and reform to all current implementations (especially mine, which I am working on)

nihaals commented 3 years ago

I've been thinking about doing a test suite for a while, I think this might be a good opportunity as pretty much every implementation will need changes to comply. One of the difficult parts is interacting with it. I want to do a GitHub Action that will run a Python benchmarker (eventually) and tester but as there are so many languages, it's difficult to get every implementation using it. One way might be creating a Dockerfile for every implementation in the org which means the Actions workflow should be very similar across implementations. The alternative would be more of a "standard" approach of setting up the tool in the workflow, but this might be difficult as not everyone has experience with writing Actions workflows.

The actual interaction with the tool should be fine as it can just take arguments and support all the main methods e.g. CLI flags and files.

romdotdog commented 3 years ago

GitHub Actions isn't hard to learn, I have minimal experience but I'd consider myself moderately capable. My concern would be people not wanting to do it at all.

nihaals commented 3 years ago

The EBNF seems to contradict this and suggests a byte separator should be added, making this more confusing. I think adding a byte separator now would be much more of a significant change as any currently encoded data would become invalid.

To keep track of the information I've added a table to the original post

redgoldlace commented 3 years ago

I would agree that the spec should be rewritten. It was mostly cobbled together by me with no real thought behind it and I assumed I'd wind up in this situation at some point. I also didn't ever really expect for there to be some 30 implementations, or for this entire project to get to this position.

In my mind there should be a byte separator after ❤️ for the sake of consistency. This was always how I thought it should've been encoded or decoded but I didn't put enough time into actually testing any of it or wording it correctly in the spec, which is quite obviously my fault. I'm not against a test suite but I'm not familiar with GH actions so I probably won't be much help there. At the same time I'm not sure if I like how seriously this entire thing is being taken. I started bottom as a joke, and the org was an extension of that. I didn't ever really expect so many people to pile onto it.

nihaals commented 3 years ago

Maybe saying v0.1 doesn't use a byte separator and v0.2 does will make it clear and help refer to the change in issues and changes