Open har7an opened 4 months ago
Oh and here's a REPL session to prove it basically works:
julia> import Pkg;
julia> Pkg.activate(;temp=true);
julia> Pkg.develop(;path=".");
julia> Pkg.add("JSON");
julia> import Serde, JSON;
julia> x = Serde.to_json(5)
"5"
julia> @assert 5 == Serde.deser_json(Number, x)
Hello, @har7an!
We are big fans of Rust here, too.
Thank you for your contribution. I really like the idea of giving options for providing your own parser. Having JSON as a weak dependency is definitely a boon, too. We should include something like that in a future release.
I have only one concern though. Wouldn't it mean that we would have to keep track of compatibility with every parser option we add to ext? Maybe an overloaded function for a parser package would be a better course of action in this case?
Hi @dmitrii-doronin,
I have only one concern though. Wouldn't it mean that we would have to keep track of compatibility with every parser option we add to ext?
I'm not sure I understand what you're referring to. Serde
has to keep track of the version of the parsers it relies on at the moment in any case. At the moment that's managed with the compat
table in Project.toml
and I see you're not checking in a Manifest.toml
either.
With my changes in place, the compat
table in Project.toml
still exists and is considered by Pkg
when performing package resolution. The only difference w.r.t. dependency management that I see is that the dependencies declared as weakdeps
aren't installed unconditionally. Everything else should basically stay the same as before.
But I'm not a seasoned Julia developer, so I may have a wrong picture there. That's how I understand the situation.
Maybe an overloaded function for a parser package would be a better course of action in this case?
How exactly would that change the "keeping track of parser compatibility" bit?
Ah, and I forgot:
I really like the idea of giving options for providing your own parser.
I'm not saying we let users provide their own parsers, that is not this PRs scope (Although I guess that technically it might be possible). I'm merely allowing users to pull in the dependencies they want, but only from the set we support. So if you want to use Serde
with Json, you must pull in the JSON
package we depend on in the Project.toml
(with the same UUID and everything). If a user wants to use Serde
with some custom MyCoolJSON
package, they'll still have to patch Serde
to allow that, same as before.
Ah, I think i get you now. So in the grand scheme of things you want to make CSV, YAML, etc. optional for a user to install?
Ah, I think i get you now. So in the grand scheme of things you want to make CSV, YAML, etc. optional for a user to install?
Exactly, IMO users should "pay" only for what they really want to have in their application.
That's a really good idea. I'd have to look deeper into Julia extensions, since, to be fair, I have never used this feature before. But we definitely want to merge this at some point.
@gryumov cc
Hello @har7an! Thank you for the suggestion – I really like this idea. @dmitrii-doronin, let's assess the user-friendliness of this feature and aim to propose a solution by next week.
propose a solution by next week.
Great, I'll look into it. @har7an does this sit well with you?
@dmitrii-doronin I think that'll work, or else I'll let you know. I should have a solution by tuesday, but probably earlier.
I'll leave the question of whether you like my "design decisions" for later then and make a usable "prototype" first. Changing how we integrate the extensions into the main thing is comparatively little effort.
Okay I added some more changes now, mostly to restore the tests back to working order. All the tests from the "JSON" group seem to pass now, but I get failures for test case 30 from deser.jl
where it checks for nothing
and missing
. As far as I can tell it doesn't construct the expected error type (WrongType
), but raises an error somewhere else instead.
My knowledge about Julia macros and the codebase isn't really big enough to determine the cause yet. Pretty much the only change I can think of that may have caused this is in Utl/Macros.jl
, where I had to patch a JSON-specific macro. I left the macro there, although one may consider moving it into the JSON
extension. If we do that, however, I'm not sure how we could keep the documentation intact since afaik Documenter
cannot document items from foreign packages and I don't know how extensions are handled in that regard.
Next I'll work on making all the other "markup languages" extensions as well.
Moved CSV
into an extension now as well.
It just occured to me: I think it would be a good idea to define the prototypes for all the to_*
and parse_*
functions inside the Serde
package. That way we could point to them inside the Serde
documentation and I assume that LSPs would pick them up as well. We could make the default impl raise a descriptive error in case the extension is missing and otherwise, transparently forward the arguments to the respective functions inside the extensions. What do you think? And should I keep going to implement the other extensions as well right now, or do JSON and CSV suffice for the user-friendliness evaluation you had in mind?
As far as I can tell it doesn't construct the expected error type (
WrongType
), but raises an error somewhere else instead.
I've seen this issue, too. It's version dependent, unfortunately. Conversion to an output type of functions seems to throw ErrorException in 1.10 instead of MethodError. Check the tests with 1.9.4.
It just occured to me: I think it would be a good idea to define the prototypes for all the
to_*
andparse_*
functions inside theSerde
package. That way we could point to them inside theSerde
documentation and I assume that LSPs would pick them up as well.
I like the idea. We could move all the interface functions into a singular module and then have extensions expand on them. it has recently occurred to me that we're missing 'ignore_field' for queries. Extending a singular interface to modules should alleviate this.
What do you think? And should I keep going to implement the other extensions as well right now, or do JSON and CSV suffice for the user-friendliness evaluation you had in mind?
The best course I think would be to go in small incremental steps. We can workout this PR and use it for some time to see, whether it's convenient or not. Does it mean that usage of JSON extension would force user to explicitly import JSON in every use-case?
Btw, you can ping me in Julia official slack.
@dmitrii-doronin
Check the tests with 1.9.4.
Jup, that works, thanks. Should I investigate that as part of this PR, or do we accept the situation for now?
We could move all the interface functions into a singular module and then have extensions expand on them.
I'm not sure we mean the same thing (yet). You can have a look at
src/{CSV,JSON,YAML,...}.jl
to see what I have settled on for the moment.
Essentially I'm collecting all args and kwargs (as varargs) and passing them
through to the extension explicitly. I tried defining the raw interface (as
function foo end
) and then just extending that in the extensions, but this
would throw errors about unknown functions when calling them. Maybe there was
some "glue" missing, I don't know. My current approach of using the Ext
module I introduced has the benefit that it allows us to throw a descriptive
error message when the extension isn't available, giving the user instructions
how to fix that. It's not ideal yet, but we can smooth that out further, of
course.
Does it mean that usage of JSON extension would force user to explicitly import JSON in every use-case?
That is correct. Afaik that's also how e.g. savefig
in Plots
works. At
least the last time I checked it required me to import FreeType, FileIO
, but
apart from that I don't have to do anything with these packages.
For the tests, I "solve" this by just importing all the packages at the top of
runtests.jl
.
Btw, you can ping me in Julia official slack.
Thanks for the offer, but I don't have Slack and don't feel like changing that. For the development here I'd prefer communication in this PR so "external" users can follow what was done and why. :)
I have now moved all the bits that need dependencies to extensions (JSON, YAML,
TOML, XML, CSV). Query doesn't really qualify as an extension since it doesn't
have an external dependency, so maybe we can compact the Query
bits into a
single file, wdyt? On the same topic: The src/{De,Ser,Par}
folders are now
pretty empty. We could compact/flatten their contents into files perhaps? I'll
leave that decision up to you.
The package in its current state, when imported, pulls in a total of 4 dependencies:
Afaik all of these are part of the Stdlib, so they don't require external code at all. All the rest is loaded "on-demand", when the dependencies for the extensions are imported.
Hey there! I was playing around with this package a bit and came here to see if there is already some work going into making the parsers optional. Glad to see this already tackled!
From my (little) experience / what I read and heard about package extensions I would like to offer a bit of information about how they are supposed to work. I believe this could be helpful to remove the fairly complicated (and possibly hacky...) infrastructure about exposing the extension functions from the main package.
package extensions are supposed to extend on already existing functions rather than exporting new symbols. The docs have a nice example for this: https://pkgdocs.julialang.org/v1/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)
The general workflow for the extension should thus be that all functions you want to extend are defined in the main package, either as simple as
module Serde
function to_json end
function from_json end
...
end
or with a default method which throws an informative error:
module Serde
to_json(args...; kwargs...) = error("please import the JSON module...")
from_json(args...; kwargs...) = error("please import the JSON module...")
...
end
and then in the extension you add more specific methods to the respective functions:
module SerdeJSONExt
using Serde # get access to barebones functions
using JSON
function Serde.to_json(...)
# do things
end
function Serde.from_json(...)
# do things
end
When you want add methods to a function which is owned by another package, you either have to explicity import it as
import Serde.from_joson
or put the owning package in front of the definition. Otherwise, you just define a new function in this module which happens to have the same name... (I ran into this multiple times in the past...)
Hi @SuperGrobi,
thanks for sharing your insights! I already noticed that the current approach doesn't really work, unfortunately. It fails for example when trying to override the serialization behavior for a custom type. So I'll try to rebase the code and then pull out more potentially interesting functions into the base package, so we only override existing things in the extension. I'll keep you posted here.
Ah. I think I see the problem now. The relevant functions are currently under Serde.SerJSON.ser_value
, so you would have to define the methods in Serde.SerJSON
and then overwrite them in the extension?
I just had another idea, which might or might not work with the whole package architecture... in theory it should be possible to dispatch on the modules used for serialisation:
ser_value(m::Module, ::Type{T}, ::Val{x}, v::V) where {T,x,V}) = ser_value(Val(m), T, x, v)
and then have users extend for JSON
ser_value(::Val{JSON}, ::Type{MyType}, Val{MyFieldName}, v::MyFieldType) = ...
(just an idea. this breaks if there is no external dependecy for serialisation.)
@SuperGrobi I've had the same thought already, but I'm indecisive which is better. My thought was that, since we have the "module" names included in all the public functions already (i.e. to_json
, parse_json
, ...), we may as well stick with how it works right now. This way it will be less surprising for users upgrading to this version (although this change probably justifies a major version increase in any case).
If I were to write the code from scratch I'd probably go for your solution, though.
Fair. That would come pretty close to a full rewrite.
JSON and YAML should be back in a working state now, the rest I'll take care of another day. I've moved the SyntaxError
-variants into the public API of the Serde base package now, because I consider them to be part of the public API. This has the direct consequence, that the exception
field of YamlSyntaxError
is now of type Any
, since YAML.ParseError
would require us to pull in the YAML dependency (which is exactly what I'm trying not to do) and isn't a subtype of Exception
either, unfortunately. I hope that's okay!
If you disagree with my decision please let me know, but then we'll have to introduce a different, opaque error mechanism.
So I've just tried to integrate the current state of the package with an application of mine, but it doesn't work as I expect. I'm at a loss with the dispatch mechanism and how methods for functions are actually chosen. I think that with the current design, my approach doesn't work. :(
I'll try a few more things with dispatching on modules to see whether what I had initially intended is possible at all. I'll keep you updated.
Hi, @har7an and @SuperGrobi. You're both doing amazing work here. I am preoccupied with other tasks right now, so I can't try tackling this issue currently. However, I know this package pretty well. So, if you need any insight into the inner workings of the package, just give me a ping here and I'll do my best to help you out with it.
@har7an @SuperGrobi I gave it a thought and the best course of action, IMO, is to have own parsers inside Serde. We a) are not breaking the current interface and b) allow for the extensions without major rewrites. What do you think?
Hey @dmitrii-doronin,
sorry for the delay, I got a little sidetracked the last weeks. I'm not sure about that. Doesn't that entail effectively rewriting everything that's already done in JSON.jl
, YAML.jl
, ... ?
Today I did a little exercise with dispatching on Val{...}
and it looks like a viable approach. My idea is that we implement a "generic" to_string
and from_str(ing)
(and possible also to_writer
and from_reader
) which take a Module
type as first argument in the public API. Then inside that function we convert the Module
to a Val
(thanks for the tip, @henrik-wolf), which we can in turn implement in the package extensions.
If an extension isn't loaded (due to e.g. the dependency not being loaded), it will throw a MethodError
(dispatching fails due to unknown signature). So I'd like to catch that MethodError
and then display an error message explaining the situation. I'm honestly no fan of exceptions because I don't see how you can tell where the MethodError
happened, but I'll see what I can do to determine whether it failed on dispatch. In my little toy example this already (sort of) works.
Then we can decide what to do: We can either keep the current interface and redirect each of these calls. For example: to_yaml
then becomes to_string(YAML, ...
internally (if we say we want to stick with this interface for the time being). Or we deprecate these "old" functions via the @deprecate
macro if we want to nudge users to using the new API.
I'll go ahead and implement this today to see whether it can actually work. My plan is to rewrite the YAML
submodule at least (because I already have code that leverages this particular functionality). I'll let you know when it's done.
I see that quite a bit of code has been added since I last rebased. I'll first try to make this version of the code work (somehow) and then think about incorporating all the new code, I think. This is gonna be fun...
TOML and YAML are now reimplemented, the tests still work. I'll implement the rest later today and then give you a quick rundown of how it works and what it looks like.
Okay, the rebase is done. The tests all pass for me.
I added the version from this branch as dev-dependency to an existing project of mine (which also overwrites serialization for a type of mine) and I had to change a few things. The changes mostly relate to the fact that where previously we had e.g. the submodule SerYaml
, we now only have the module YAML
which defines all the function prototypes.
So essentially, for migration, afaict, we have to do: (Ser|De|Par)Yaml
-> YAML
and users will have to import the respective modules in their code (also in test code). They can then either use the previous functions (to_yaml
) or the new API (to_string(YAML, ...
) as they wish.
What's missing still (next to the other TODOs mentioned below) is a way to show pretty error messages like "hey there, you probably meant X, but to do that you must import Y in your code". I'm not familiar enough with exception handling to tell apart a MethodError
raised from a failed dispatch immediately in the calling function from another MethodError
raised further down the call stack. So if anyone can give me hints in that regard, I'd be very grateful.
to_string
(replaces to_(yaml|json|...)
)from_string
(replaces from_(yaml|json|...)
)parse
(replaces parse_(yaml|json|...)
)Query
Hi @har7an , I see that some really great work has been done on the package! I went through the changes and I like what I see!
Here are some thoughts on what we have now:
Determine the new error type SerdeExtensionError
or something like that
struct SerdeExtensionError <: Exception
extension::Symbol
end
function Base.show(io::IO, e::SerdeExtensionError)
return print(io, "SerdeExtensionError: to use this method, first you need to import the $(e.extension) module.")
end
Then add to the methods to_string
, from_string
, parse
a check that the required module is imported into Main
function to_string(module_name::Symbol, args...; kwargs...)
return if module_name in names(Main, imported = true)
to_string(Val(module_name), args...; kwargs...)
else
throw(SerdeExtensionError(module_name))
end
end
function from_string(module_name::Symbol, args...; kwargs...)
return if module_name in names(Main, imported = true)
from_string(Val(module_name), args...; kwargs...)
else
throw(SerdeExtensionError(module_name))
end
end
function parse(module_name::Symbol, args...; kwargs...)
return if module_name in names(Main, imported = true)
parse(Val(module_name), args...; kwargs...)
else
throw(SerdeExtensionError(module_name))
end
end
We had to make small changes to the function signature, namely, change the Module
type to Symbol
.
Now we have the main functions through which all other methods from other modules must pass.
Also, for this we will have to change the passed values from Val{:JSON}
to :JSON
inside the main methods from modules JSON
, CSV
, ... , like
function to_json(args...; kwargs...)
Serde.to_string(:JSON, args...; kwargs...)
end
I'm not sure if this is necessary, since it seems to me that these methods serve only as a layer between the (de)serialization/parsing methods and their implementation in the extension
Hey! @har7an thank you for the great work!
@artememelin Do we want to keep the original api with the ser_json
, ser_xml
... calls? If we remove them, we would get around most of the problems, as you simply will not be able to accidentally call methods for which you need the respective package... (a similar problem exists if we do the conversion of module to symbol. Nothing is stopping me to just call something with Val(:json)
, whereas I can not call something with Val(JSON)
if I do not have the right module available.)
Similarly, adding custom serialisation and deserialisation behaviour would then simply require the user to dispatch on
ser_value(Val{JSON}, ::Type{T}, ::Val{x}, v::V) where {T,x,V}
rather than
SerJson.ser_value(::Type{T}, ::Val{x}, v::V) where {T,x,V} = v
(I think this would allow us to remove any notion of the specific parser/writer package from the main package. This way, it would be pretty easy for a user to bring their own parser/writer package to integrate with Serde.jl
)
@artememelin Thanks for the feedback!
I think that @henrik-wolf has a good point, actually. The way things currently are, we cannot call e.g. to_string
for an extension that doesn't have it's required package loaded, simply because Julia will turn to_string(JSON, ...)
into an "UndefVarError" exception. Turning the argument from Module
to Symbol
allows users to invoke e.g. to_string(:Json, ...)
, which looks almost right except for casing. I think that having this "freedom" in the function argument will require us to explain this at least to users new to Julia as language.
I like the proposal for the error handling though. I went ahead and modified it slightly: Since to_string/from_string/parse
are pretty much fool-proof (and, as mentioned by @henrik-wolf, offer a neat interface for user extensions), I moved the error handling into each markup-specific submodule. There's now a function if_module
that works as proposed by @artememelin which still uses the same symbols as before internally, but doesn't expose any of this to the user.
So now the state of affairs is as follows:
When a user calls the new to_string
/...
without the required module:
julia> Serde.to_string(JSON, Dict("a" => 1))
ERROR: UndefVarError: `JSON` not defined
Stacktrace:
[1] top-level scope
@ REPL[3]:1
When a user calls the previous JSON.to_json
without the required module:
julia> Serde.JSON.to_json(Dict("a" => 1))
ERROR: SerdeExtensionError: to use this method, first you need to import the 'JSON' module.
Stacktrace:
[1] if_module(f::Main.Serde.JSON.var"#2#3"{@Kwargs{}, Tuple{Dict{String, Int64}}}, mod::Symbol)
@ Main.Serde /var/home/hartan/repos/github.com/bhftbootcamp/Serde.jl/src/Serde.jl:70
[2] to_json(args::Dict{String, Int64}; kwargs::@Kwargs{})
@ Main.Serde.JSON /var/home/hartan/repos/github.com/bhftbootcamp/Serde.jl/src/JSON.jl:109
[3] to_json(args::Dict{String, Int64})
@ Main.Serde.JSON /var/home/hartan/repos/github.com/bhftbootcamp/Serde.jl/src/JSON.jl:108
[4] top-level scope
@ REPL[4]:1
What do you think about that everyone?
Given the feedback so far I assume that the to_string
/from_string
/parse
functions are acceptable to you. I'll go ahead (soon-ish) and document these since I think that with the current proposal, there's a lot of value to end-users w.r.t.
if target === :json
Serde.JSON.to_json(val)
elseif target === :yaml
Serde.YAML.to_yaml(val)
else
...
end
Regarding the API for Query
I'm not sure what to do about that. I guess it pretty-much falls into the category of "bring your own module/extension" as mentioned above. I'm sure I'll figure something out, though...
Okay I think this PR is now pretty much done from my side. I sat down and had a thought about how we could enable "arbitrary" extensions from e.g. users. You can find a demo of that in test/extensions.jl
. This testfile also makes sure that the public APIs (both the previous and the one introduced here) yield the same results with a little toy example.
The Query
is now also implemented using this "extension" mechanism. Since it doesn't have an external dependency, I simply implemented it on the Serde.Query
module. So you call it like this:
Serde.to_string(Serde.Query, Dict("a" => 1))
The new functions are now documented and live in a file API.jl
. I'm not good with names, and that one at least seemed to fit what I had in mind...
I have moved around all the other files, too:
CSV.jl
, JSON.jl
, ...: They now all live in the Formats/
subdirectory, along with the new implementation for Query
.Par/
: The file in here was empty, so I removed this entirelySer/Ser.jl
: Renamed to Serialize.jl
De/De.jl
: I merged Deser.jl
directly into that file and renamed it to Deserialize.jl
If you want any of this undone of renamed, please let me know. Also I'd like to hear your feedback on the new code!
I tried integrating this into some code of mine and I noticed that, since we're using extensions now to provide YAML/JSON/..., users can no longer override the Ser${LANG}/${LANG}_value
function at will.
I guess technically these functions have never been part of the public API (they aren't exported, nor documented, nor mentioned in the docs) so we actually achieved encapsulation in Julia. I could imagine that more users than only me sort of use this in their own code, though...
I think in the long run it would be neat to have something like a serialize_with
, as Rusts serde
crate has. Then users can bring their own transformation functions to modify values as they see fit before we dump them into some IO buffer. But given the size of my PR as it currently is, that's probably out of scope for here. :(
Or do we already have such a thing and I'm unaware of it?
Hello,
thanks for creating this amazing package! I'll use it in a new project of mine to work with JSON and, being a Rust programmer, I was missing a replacement for
serde
until I found this. Amazing work!I tend to be very picky about the dependencies I pull into a project and try to keep them minimal. Since I only need JSON personally, I was wondering if the rest could be left out of the package.
This is my first attempt at writing an extension in Julia. What I did is make
JSON
a weak dependency and create a new module in theext/
subfolder for this purpose. Thanks to how the package is written, I pulled out the JSON-specificSerJson.jl
,ParJson.jl
andDeJson.jl
into that new extension asSer.jl
,Par.jl
andDe.jl
respectively and had to change barely a thing in the process.The
Serde
package now has a new fileJSON.jl
that checks at runtime whether theJSON
extension is present and exposes the functions from there to the user. Users that don'timport
theJSON
package will not be able to serialize/deserialize objects, but neither will JSON be pulled in unconditionally bySerde
. The code isn't pretty yet, I was mostly trying to see if it works and wanted to discuss the idea with you before investing more time.If you're interested, I'd be happy to hear your feedback and convert the other "languages" into extensions as well!
Pull request checklist