Closed QuartzLibrary closed 3 years ago
Hi,
Saw your fork before you made the main repo public, and was wondering what you were planning to do with it.
I was initially quite skeptical of expanding the API of the crate, since I know breaking changes will have to happen, but as long as you understand it's still somewhat internal API, I guess it's better than a fork. And you are not the first to ask about using the json_typegen_shared
crate, so I'm already avoiding unnecessary breaking changes anyway.
I've exposed Shape
and added a codegen_from_shape
you can use and Options
was already exposed so hopefully that should be all you need to avoid the fork?
I've actually been working on replacing the serde_json parsing with doing the inference straight on a stream of tokens from a custom lexer, so I'm actually quite intrigued by your approach of using the serde visitor API. Especially since I have also considered adding support for more input formats, and just using serde is probably a better approach than custom lexers. Have you tried it on any large files yet (e.g. ~1GB of JSON)?
Saw your fork before you made the main repo public, and was wondering what you were planning to do with it.
Yes, I didn't want to bother you if it turned out I had to stop for some reason, plus getting it working from the fork allowed me to be very specific.
but as long as you understand it's still somewhat internal API, I guess it's better than a fork.
I totally understand, and I don't mind breaking changes especially if semver compatible. But in any case I can jump in to fix anything that comes up.
that should be all you need to avoid the fork?
Yes, those changes are sufficient for the use-case, I have pushed a branch that relies directly on this repo: https://github.com/QuartzLibrary/schema_analysis/tree/json_typegen This will also fix an issue with docs.rs not liking remote git dependencies when it makes it to crates.io.
I've actually been working on replacing the serde_json parsing with doing the inference straight on a stream of tokens from a custom lexer, so I'm actually quite intrigued by your approach of using the serde visitor API. Especially since I have also considered adding support for more input formats, and just using serde is probably a better approach than custom lexers.
Glad you found it intriguing! From what I have found, Serde works really well for this. Very few formats (like toml with dates) provide more info than the simple Serde visitor anyway, so I don't think you could gain much from something custom. The main disadvantage with the approach is that any non-shape analysis has to be mostly restricted to the leaf nodes (structs and lists get only a list of field names and length, respectively), but in most case that is not really an issue.
Have you tried it on any large files yet (e.g. ~1GB of JSON)?
I have tried it with:
It could probably get faster if we didn't run the secondary analysis/sampling (which you can't yet do).
And of course, all numbers are indicative. Just a ballpark of how it runs on my laptop. Plus all are with the files fully in memory, having to buffer would likely slow things down.
Sorry, seems I completely forgot to make an actual release once I had made the necessary changes. 😬
I've released 0.7.0 now. I also made another change that may be of interest to you: Option parsing is now an optional feature, so if you use default-features = false
you should drag in less transitive dependencies now.
Closing this issue for now, but of course feel free to add further comments here or open another issue if there are other changes that would be useful. 🙂
Hello!
First things first: thank you for the great library!
I am the author of schema_analysis, a crate that allows anyone to infer the schema of any self-describing format using Serde. Our projects are somewhat similar, but with different focuses and strengths, which is why I have tried to integrate the two together.
As mentioned here, I would be interested in working out a way to more seamlessly access the internals of
json_typegen_shared
to link directly to your library instead of the somewhat ugly workaround of forking and addingpub
to what I need.To clarify the context: I currently expose access to the
shape
,options
, andgeneration
modules in a fork ofjson_typegen_shared
. You can see here how I am using them.If you were interested, I would also be happy to use directly a codegen function that takes a Shape directly, along the lines of this one. This would also reduce my imports to the
shape
andoptions
modules.In any case, thank you again for your OS work!
Best, Quartz