alex-dixon / serde_edn

(De)serialize EDN data
Other
13 stars 2 forks source link

Join serde_edn efforts ? #1

Open alexmaco opened 4 years ago

alexmaco commented 4 years ago

Hi,

I have also been working on a serde_edn implementation. It's still incomplete, but partly available here: https://github.com/alexmaco/serde_edn. Unfortunately I haven't had the time to complete it, but I have recently restarted working on it.

I have also run into similar design issues as you did (for instance https://github.com/serde-rs/serde/issues/1536).

I see we have come up with different Value types, and different edn! macros.

I'm opening this issue in the hopes that we can discuss these different design choices, and possibly reach an agreement to join these efforts into a single serde_edn library, published on crates.io. Right now, the first version I published about 10 months ago was there to reserve the name and is not really functional.

Alternatively, we could always go with maintaining different libraries under different names, but i think the ecosystem will benefit more from a unified solution that solves most common use cases.

What do you think ?

alex-dixon commented 4 years ago

Thanks for reaching out. I’m all for deduplication of efforts. What are your goals with your serde implementation of the EDN spec? Mine was to make the fastest EDN serializer and deserializer possible. I haven’t benchmarked against other implementations in Clojure and other languages but the library keeps up with serde_json fairly well across data that has an exact equivalent in JSON and data that substitutes string keys for EDN keywords.

The majority of spec is implemented save reader tags. There’s a place in code where they can be added. https://github.com/alex-dixon/serde_edn/blob/a8d64edbad804d497f3a237a1a609c52c40ca71a/src/de.rs#L1166

I’m happy to discuss whatever else with you including the the issue you referenced. In short it seemed serde’s data model and de/serialization trait isn’t built to represent more than one kind of sequence and can’t be extended to accommodate one. If I have time I’d investigate whether separate EDN traits and impls could hurt performance and if so how that might be improved. From a functionality and code organization standpoint I’m not aware of a better solution.

I’m on rust discord with username alex-dixon. If there’s another communication method that works better for you just let me know.

alexmaco commented 4 years ago

It seems we are on different timezones. Until we synchronize for a live discussion, I will try and write some ideas here.

Firstly, I am interested both in performance (and serde_json shows there is great potential for that), as well as correctness. I think that, as much as possible, a 1.0 serde_edn should be able to load and output edn in an intuitive way, while striving to lose as little information as possible. edn is a richer format compared to json, and I think this is one of its strong points.

Secondly, from what I gather looking at your implementation, your approach of starting from serde_json seems a lot better. Modulo some code cleanup, I propose we use that as a base, and refine and add onto it some things I developed more completely, such as the edn! macro.

Thirdly, let's talk design details. While there is a mismatch between the serde data model and the edn types, I think it is possible to achieve an overall better result (from the correctness and completeness standpoint) than either implementation does currently.

To kickstart the discussion, I propose we consider a few specific design items:

Let me know what you think.

alex-dixon commented 4 years ago

On Nov 28, 2019, at 14:44, Alexandru Macovei notifications@github.com wrote:

 It seems we are on different timezones. Until we synchronize for a live discussion, I will try and write some ideas here.

Firstly, I am interested both in performance (and serde_json shows there is great potential for that), as well as correctness. I think that, as much as possible, a 1.0 serde_edn should be able to load and output edn in an intuitive way, while striving to lose as little information as possible. edn is a richer format compared to json, and I think this is one of its strong points.

Secondly, from what I gather looking at your implementation, your approach of starting from serde_json seems a lot better. Modulo some code cleanup, I propose we use that as a base, and refine and add onto it some things I developed more completely, such as the edn! macro.

Yes. I gave a fair effort at the edn! macro but failed. Anything that replaces it is likely an improvement. I wasn’t willing to compromise on the apparent need to use a comma delimiter when writing raw edn inside of it (Rust’s macro language can’t parse it without commas I think). At this point I can let that go. Thirdly, let's talk design details. While there is a mismatch between the serde data model and the edn types, I think it is possible to achieve an overall better result (from the correctness and completeness standpoint) than either implementation does currently.

To kickstart the discussion, I propose we consider a few specific design items:

The representation of sets in the Value type: representing a set with a BTreeSet/HashSet I think would better model the edn semantics. Set values would represent uniqueness, and the lack of sequential order (in the case of HashSet). Also, Value's PartialEq implementation would work as expected, regardless of the order in which values were specified in text. Yes. I’d suggest this be configurable with compiler flags along with other complex types. I forget why I didn’t do this. Namespacing support: keywords and symbols support namespacing. Since namespacing is specified as part of edn, I think parsing for / should not be left up to the user, and serde_edn should encode that information. I have made an attempt at this in my repo (only for Symbol), and there may be room for criticism and improvement there (perhaps storing a Box for the data and an Option for the / position ?) That seems like good design. A Keyword and Symbol struct with a that information could provide a more performant implementation for something like clojure’s qualified-keyword? Methods and data that can be used to obtain the different segments (as you suggest) could do the same as in clojure (name :a.b/c) -> “c”, (namespace :a.b/c) -> “a.b” The distinction between list and vector: since they are used for different purposes/with different meanings in edn data, I think it should be possible, at least in some cases, for the user to distinguish between them. My first proposal would be to consider providing a List type (as dtolnay suggested), that would wrap sequence types, and come with custom Serialize/Deserialize impls to use () instead of []. This way, a user may create a struct and use a serde_edn::List for one field, to ensure that field is always treated as a list. Overall I think there are multiple possibilities to consider here. Yeah that should probably be the default. If I understand correctly it would be a List struct that is otherwise a rust vec but can be “.serialized” as () and have its identity preserved outside the Value::List enum wrapper. I’m interested in a backing List that’s more aligned with the one in Clojure but that’s not a requirement for the 1.0 goals you outlined above and can be added as a compile time option later. Let me know what you think.

Thanks for such a great start to the discussion. Looking forward to this.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

alexmaco commented 4 years ago

So the first thing I tried to do was bring in the more complete edn! macro (along with tests for it), but I quickly hit the fact that tests don't seem to compile anymore, starting with 3577b9fd as per git bisect.

Now, I don't think I fully understand the idea of introducing the EDN(Serialize|Deserialize|Visitor) traits. Since they are different traits from the ones from serde, I don't see how they can extend the serde model in practice (i.e. outside of the library tests). The commit date suggests that they came before David's answer to the serde issue, so that may explain it.

However, I may be misreading their intent, so can you please clarify a little where you were trying to get them ? I'm considering whether they should be completed/fixed in some way, or removed.

alex-dixon commented 4 years ago

The tests copied from serde_json need updated. The ones here should work https://github.com/alex-dixon/serde_edn/blob/master/tests/edn_test.rs

Now, I don't think I fully understand the idea of introducing the EDN(Serialize|Deserialize|Visitor) traits. Since they are different traits from the ones from serde, I don't see how they can extend the serde model in practice (i.e. outside of the library tests). The commit date suggests that they came before David's answer to the serde issue, so that may explain it.

Yes they can be removed completely. They were based upon my false understanding that visit_seq should be used to deserialize more than one sequence (vectors, lists, sets).

Honestly, serde was hard for me to keep up with in general. I think I understand now that an edn list is best deserialized as a struct (List) that wraps a vec, using the same pattern as Keyword and Symbol (calling visit_map with a ListDeserializer that implements MapAccess, which calls deserialize on its seed (PhantomData?) with a ListFieldDeserializer... ).

alex-dixon commented 4 years ago

Hi @alexmaco. Just curious if you had any other thoughts on the approach you suggested and whether you wanted to implement it here or try in your own repo. Knowing there’s a better way to accomplish list serde has been on my mind. I feel motivated to at least give it a shot if I can find time but not completely confident I can pull it off. If you could let me know your thoughts whenever you have the chance I’d appreciate it. Thanks!