Open wezm opened 4 years ago
This is a marvelous idea! Thanks for submitting it! :D
I'm not sure I can take a look at this in the next few weeks, but would love to have this feature. If you want to take a stab at it, I can probably give you enough guidance to get you started, though (:
I might be able to take a look on the weekend. Did you have and preferences/thoughts regarding whether the version information was output by default?
I think showing the version unconditionally would be just fine - chars
is somewhat aggressively non-configurable and maximally informative for human users, so just adding it would work well (:
To add this feature, I think it's a two/three step process:
chars_data
subcrate in the chars workspace here,write_name_data
in the unicode portion to emit another table giving unicode versions & the ranges added in them (ideally make it a memory-optimized data structure; I don't extremely mind searching through n*13ish
unicode versions for each character, but would be worried if we added a table mapping each character to a version number... maybe there's something one could do with tries though?)Display
impl's branch for Unicode here-ish to show the version number....and that's about it, I think! The main difficulty will probably be making a parser for that data file (the ones I made I got by with making a regex-based one, but feel free to use any other reasonable method, tbqh) and finding a decently space-efficient repr for the version table. Best of luck!
I made a start on this yesterday. I'm 50–75% done. Fortunately I think what you described above matches what I did/planned to do 😃
That's fantastic to hear - excited to see what you came up with (:
I deal with Unicode a fair bit and
chars
is a handy tool. Sometimes it would be convenient to know which Unicode version assigned a particular codepoint.E.g the output from
chars
might look something like this. The version information might not be shown by default and require a command line flag if it was deemed too noisy.I think the information is available via the
DerivedAge.txt
file in the UCD.