Interlisp / medley

The main repo for the Medley Interlisp project. Wiki, Issues are here. Other repositories include maiko (the VM implementation) and Interlisp.github.io (web site sources)
https://Interlisp.org
MIT License
376 stars 19 forks source link

Create an HTML image outputstream #1814

Open hjellinek opened 2 months ago

hjellinek commented 2 months ago

Create an image output stream implementation that writes HTML.

rmkaplan commented 2 months ago

@hjellinek , I think I didn't quite understand what you were asking wrt utf-8, and I may have thrown you off the track.

If you want to create an image stream for HTML with UTF-8 as its character encoding, I think all you have to do is open the backing stream with the parameter (FORMAT :UTF-8), and then simply write to the backing stream with ordinary Medley character functions (PRIN1, TERPRI, PRINTCCHODE...). You shouldn't need to worry about decoding and recoding, that should happen automatically. (If you want to put out binary data, use BOUT to avoid any coercions.) Unlike Tedit, you shouldn't need to deal with the internals of the UNICODE file.

You might also want to specify the parameter (LINELENGTH T) so that the Medley functions don't thrown in unexpected EOL's in an attempt to do their own line-breaking (essentially treating PRIN1 as PRIN3 etc.)/

hjellinek commented 2 months ago

Hi @rmkaplan, thanks for clarifying. I'll be able to take advantage of all of that information in my eventual implementation. I'm pretty sure I'll still need to perform my own conversion from XCCS to Unicode in one case, though: I've got a collection of font metrics that are indexed by Unicode character code. To look up the width of a given XCCS character, for instance, I first need to convert it to the equivalent Unicode code point and use that value as the index into the font metrics table.

It's good to know I can use the native XCCS codes for everything else.

rmkaplan commented 2 months ago

You may be thinking of this, but wrt fonts, you could take the problem off-line by separately shuffling the widths vectors to make new XCCS-organized font files. That's where you would have to use some of the translation macros in Unicode.

hjellinek commented 2 months ago

Yes, I did think of that, and I may pursue it. At the moment, simply collecting the metrics from the fonts I've chosen (Google Web Fonts) is proving to be the challenge.

nbriggs commented 2 months ago

Does https://github.com/drwpow/google-font-metrics provide any useful extraction of the metrics you need?

hjellinek commented 2 months ago

The code I wrote uses a similar technique. The insane thing is that none of the tools I've found actually extracts metrics from the font files themselves. Instead, they draw the glyphs offscreen and measure them!

I should look for and use a WOFF2 file parser instead, but this approach is working for me for the moment.

nbriggs commented 2 months ago

Yeah, you could do that. Also, FontForge can read WOFF2 files and display the metrics -- https://fontforge.org/docs/ui/mainviews/metricsview.html

hjellinek commented 2 months ago

@rmkaplan, I'm using XTOUCODE to compute the Unicode equivalent for every XCCS character code. OUT is a stream open for output.

This works fine:

(CL:DOTIMES (X 256) (CL:FORMAT "~D ~D" X (XTOUCODE X)) (TERPRI OUT))

However, this blows up. Any value greater than 256 blows up, in fact:

(CL:DOTIMES (X 65535) (CL:FORMAT "~D ~D" X (XTOUCODE X)) (TERPRI OUT))

The error is:

Invalid argument: (0)

which occurs under a call to READ-UNICODE-MAPPING-FILE. I guess I didn't initialize the mapping tables correctly. What is the correct way to do that?

screenshot_651

hjellinek commented 2 months ago

Yeah, you could do that. Also, FontForge can read WOFF2 files and display the metrics -- https://fontforge.org/docs/ui/mainviews/metricsview.html

I used to have FontForge installed. I wonder where it went.... Anyway, I see it's possible to script it, which would help me a lot with this task.

I have learned a lot about JavaScript and CSS in the process of working on this.

rmkaplan commented 2 months ago

Actually, it wasn’t “any" code over 256, it was any code anywhere that didn’t actually have a mapping. And XCCS is empty from 256 up to character sets 40 or 41. A NULL check was missing that didn’t catch that case (which presumably wouldn’t have happened for a valid XCCS code). It should have fallen through to where it assigns an arbitrarily unused Unicode character and keeps track of that at least for input/output round-trips. I’ll put out a PR for Unicode.

In fooling around, I ran across another glitch, not in Unicode but in the fact that Tedit now uses TTYINPROMPTFORWORD to get its type-in strings in the promptwindow, e.g. for search. TTYINPROMPTFORWORD appears to strip spaces, so it converts an intended search “abc “ to “abc”. I wonder if there is a way to control that.

On Aug 26, 2024, at 4:12 PM, Herb Jellinek @.***> wrote:

@rmkaplan https://github.com/rmkaplan, I'm using XTOUCODE to compute the Unicode equivalent for every XCCS character code. OUT is a stream open for output.

This works fine:

(CL:DOTIMES (X 256) (CL:FORMAT "~D ~D" X (XTOUCODE X)) (TERPRI OUT)) However, this blows up. Any value greater than 256 blows up, in fact:

(CL:DOTIMES (X 65535) (CL:FORMAT "~D ~D" X (XTOUCODE X)) (TERPRI OUT)) The error is:

Invalid argument: (0) which occurs under a call to READ-UNICODE-MAPPING-FILE. I guess I didn't initialize the mapping tables correctly. What is the correct way to do that?

screenshot_651.png (view on web) https://github.com/user-attachments/assets/7ee223c3-579c-48d0-b5c1-22bb0009fdaf — Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/1814#issuecomment-2311267402, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSTUJKRRUCU6E5QJ3XLE23ZTOY7LAVCNFSM6AAAAABNETT7NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRGI3DONBQGI. You are receiving this because you were mentioned.

hjellinek commented 2 months ago

FYI re FontForge, there's an associated command line tool, showttf, which looked like it might do exactly what I need. After some Type II-III fun building the command line tools, I found that it would be a ton of work to find the actual width, line height, etc., using showttf or FontForge's metrics window. But my JS code seems to give me what I need.