facebook / lexical

Lexical is an extensible text editor framework that provides excellent reliability, accessibility and performance.
https://lexical.dev
MIT License
17.5k stars 1.45k forks source link

Convert Lexical's Serializied (stringified) state into HTML #4003

Closed agjs closed 1 year ago

agjs commented 1 year ago

Hi folks. Under the Serialization & Deserialization section, you have couple of examples on how to serialize and desiralize the state to and from HTML. Whatsoever, in each example, this is done in the context of an actual Lexical editor, hence, the reference to an editor state is needed.

How do we parse the state into the HTML in a different context, e.g. I have stored a blog article in a database, and my UI fetches that stringified content and I want to render it? As of now, it seems impossible and your example of doing this are purely based on rendering HTML inside of the editor.

We are switching from Draft.js to Lexical and we more or less need identical library like this https://www.npmjs.com/package/draftjs-to-html, just for Lexical.

Thanks in advance!

thegreatercurve commented 1 year ago

For rendering purposes, could you just use a Lexical editor with editable set to false?

Otherwise, we don't expose any APIs through core for converting editor state outside of an editor context. It helps keep core fairly lean the APIs for serialisation and deserialisation fairly clean.

If you did want to do the conversion, you would have to write your own helper function which recursively traverses through the tree like editor state of Lexical and then converts the nodes into HTML.

acywatson commented 1 year ago

For rendering purposes, could you just use a Lexical editor with editable set to false?

Otherwise, we don't expose any APIs through core for converting editor state outside of an editor context. It helps keep core fairly lean the APIs for serialisation and deserialisation fairly clean.

If you did want to do the conversion, you would have to write your own helper function which recursively traverses through the tree like editor state of Lexical and then converts the nodes into HTML.

You can also use a headless editor for this on the server, if you don't want to do it on the client.

agjs commented 1 year ago

It helps keep core fairly lean the APIs for serialisation and deserialisation fairly clean.

While I understand the sentiment behind this, I believe that you are providing lack of utilities simply to adhere to some values. In this case, "Hey, we won't provide you with this, because it doesn't fully relate to the context of the editor".

I don't see a use case ever, where a user of this editor won't want to serialize and store this somewhere. Can you give me an example of where is this editor useful, without storing the data somewhere? How are you guys at Facebook storing it? How do you convert the actual content of the editor right now?

I believe that values like this provide nothing but frustration. We have decided to switch from Draft.js to Lexical because you deprecated Draft, why do I have to waste hours of my time, and judging by your response, even more time, just so I can take the format of the editor and convert it to something else? I'm not trying to be rude or anything, but it seems to be that the Lexical is missing a fundamental feature. If you disagree with me, tell me please, what will I do with the editor state, if I can't convert it to e.g. HTML in any context I need it?

acywatson commented 1 year ago

Sorry, maybe we're misunderstanding each other here. It's not at all clear to me what you're asking for that can't already be done. If you want to serialize to HTML, we provide a utility for you to do that (generateHTMLFromNodes) - DraftJS never did this, by the way. If you want to do this in some non-browser context, we provide yet another utility for you to do that (headless mode). If you just want to serialize and store the editor state, you can use EditorState.toJSON to serialize to native Lexical format and store that. What exactly are you asking for?

I think @thegreatercurve and I both assumed you read through the documentation that you linked, which would have shown you all of this (except maybe headless, which I pointed out).

agjs commented 1 year ago

Hi, @acywatson. Thanks for taking the time to respond, really appreciate it.

The problem with generateHTMLFromNodes is that it expects an editor instance as an argument. In my example, for Programmer Network that I'm building, I want to store the stringified editorState in a database (for our articles, thoughts (tweets), etc), and then convert them to HTML in our e.g. Article view. We can't do that with what you proposed, because I'd need an editor instance in the view that absolutely has no editor at all. With draftjs, we used an npm module called draftjs-to-html, which did exactly this, without needing any editor context. You would pass the e.g. stringified state, and you would get the HTML back. They even provided some utilities, so you can "decorate" your HTML, etc.

In simple words, I want to convert the lexical state to HTML, outside of any editor context.

acywatson commented 1 year ago

We can't do that with what you proposed, because I'd need an editor instance in the view that absolutely has no editor at all.

What if you rendered to HTML on the server using headless + generateHTMLFromNodes, then sent it down to the client and did whatever you wanted with it there?

I see what you're asking for and I agree that we don't provide a way to do exactly that (rendering from serialized EditorState to HTML on the client with no editor), but I don't really understand what the constraints are that prevent you from approaching this in one of the several ways that we DO provide for you to do this. Are you trying to to just avoid instantiating an editor on this particular view? If so, why, exactly? You need to depend on Lexical either way - either for this render-to-HTML-without-an-editor util that you're asking for, or to do it in the way we're currently suggesting.

This util that you're asking for is basically just a dfs on a JSON tree, since we'd have to make it completely configurable to satisfy everyone who wants to render Lexical nodes to HTML according to whatever special circumstances they have. So, I see why it would be convenient for you to have something like this, but I'm not sure I agree with your sentiment that it's somehow a critical feature of the library.

thegreatercurve commented 1 year ago

Yeah, I don't think we can remove the editor instance from the conversion operation. The logic for parsing and validating the serialized editor state shape is baked into the editor here.

Plus, much of the HTML creation logic is baked into the node classes themselves in exportDOM (which accepts an editor instance to add to the annoyance), so if we created a util function which didn't use editor, you'd still need to specify an EditorConfig which has a list of nodes classes. This config also accepts a theme, which contains classnames which we add to generated HTML.

Semi-related discussion here: #2587

agjs commented 1 year ago

Thanks a lot for the responses, @acywatson and @thegreatercurve. I got enough of information from you folks, and I guess I'll just reference the editor and render the stuff I need.

Regardless, I believe that you also feel that what I asked in a way made sense, because while indeed, a util to export the state into HTML wouldn't be necessarily part of the Lexical's API, it's still a fundamental utility and nearly every editor out there supports something similar. Even Draft.js that you guys built and maintained had several community built utilities, which if you still check on npm, have tens of thousands of monthly downloads. I think that stats speak for themselves.

Anyway, thanks a bunch for your responses.

Cheers

AlessioGr commented 1 year ago

https://github.com/AlessioGr/payload-plugin-lexical/blob/master/serialize-example/NewRichTextParser.ts

This might help! I partially implemented that there, at least for the most important nodes

erikmartinessanches commented 4 months ago

I also have have similar use case as @agjs. I also saw kg-lexical-html-renderer but I haven’t tried it yet.

TechSynthesis commented 4 months ago

@agjs how did you solve this. I'm currently trying to find an answer as well

coreyward commented 4 months ago

This is a hard blocker for adoption of Lexical for me. HTML is not universal. I expect to be able to take the input, evaluate it, process it as needed, and render it in a variety of formats. Other editors like Slate have support for this out of the box, and it allows for tools like Sanity.io to build structured content experiences that can be consumed for all sorts of display purposes.

A rich text editor with output that is effectively opaque is only useful for rendering a rich text editor, and even then, only for as long as the RTE is valid and supported. The means by which it is stored and transformed into presentation formats needs to be well documented for it to be broadly usable and portable.

acywatson commented 4 months ago

This is a hard blocker for adoption of Lexical for me. HTML is not universal. I expect to be able to take the input, evaluate it, process it as needed, and render it in a variety of formats. Other editors like Slate have support for this out of the box, and it allows for tools like Sanity.io to build structured content experiences that can be consumed for all sorts of display purposes.

A rich text editor with output that is effectively opaque is only useful for rendering a rich text editor, and even then, only for as long as the RTE is valid and supported. The means by which it is stored and transformed into presentation formats needs to be well documented for it to be broadly usable and portable.

I'm not sure I understand your concern here. In what sense is the output "opaque"? It's a tree of nodes that you can pretty easily traverse and use the properties thereon to render into whatever presentation format you want. We provide some basic tools for rendering to common formats, like HTML, Markdown, or plain text, but there are internal use cases where we render to all sorts of special formats for different circumstances.

If you just want to render exactly what the user wrote in the editor, then yea, the easiest thing is usually to pass the serialized data back into a read-only editor. You certainly don't have to do that, though.

coreyward commented 4 months ago

@acywatson The original question here was "How do we parse the state into the HTML in a different context?", and the answer has been that it is only viable with an editor instance because the logic is intertwined with the editor code.

It's a tree of nodes that you can pretty easily traverse and use the properties thereon to render into whatever presentation format you want.

Great, do you have a link to documentation on the format used for this? I saw this, but it seems to refer to the live state held, not the JSON representation, and either way doesn't seem to have full documentation, linking instead to the code (e.g., LexicalElementNode) which is not particularly useful when trying to author a tool to sanitize and render content.

If there is some editor-independent tooling demonstrating or implementing a feature to convert the stored node tree (ostensibly in JSON) into HTML I'd love to see that. The docs only show doing it with an editor instance.

acywatson commented 4 months ago

Each node defines how it's serialized to JSON via it's implementation of the exportJSON method:

https://github.com/facebook/lexical/blob/ac61359777161743e2c647ddfc42b0df61ff5144/packages/lexical/src/nodes/LexicalElementNode.ts#L506

Certainly wouldn't hurt to add some documentation on that for each node that we maintain.

If there is some editor-independent tooling demonstrating or implementing a feature to convert the stored node tree (ostensibly in JSON) into HTML I'd love to see that

Here's a simple algorithm to build an HTML string from a Lexical JSON state:

const lexicalJSON =
  '{"root":{"children":[{"children":[{"detail":0,"format":0,"mode":"normal","style":"","text":"hello world!","type":"text","version":1}],"direction":"ltr","format":"","indent":0,"type":"paragraph","version":1}],"direction":"ltr","format":"","indent":0,"type":"root","version":1}}';

function getHtmlFromLexicalJSON(json: string): string {
  const parsed = JSON.parse(json);
  const root = parsed.root;
  return getHtmlFromLexicalNode(root);
}

function getHtmlFromLexicalNode(node: {
  type: string;
  children?: Array<{ type: string; text?: string; tag?: string }>;
}): string {
  let result = '';
  const { children } = node;
  if (children !== undefined) {
    if (children.length === 0) {
      return result;
    }
    children.forEach((child) => {
      const { type } = child;
      if (type === 'paragraph') {
        result += `<p>${getHtmlFromLexicalNode(child)}</p>`;
      } else if (type === 'heading') {
        const tag = child.tag ?? 'h1';
        result += `<${tag}>${getHtmlFromLexicalNode(child)}</${tag}>`;
      } else if (type === 'text') {
        result += `<span>${child.text}</span>`;
      } else {
        result += '<span style="font-weight: bold">Unknown Node</span>';
      }
    });
  }
  return result;
}

getHtmlFromLexicalJSON(lexicalJSON);

Here's a stackblitz: https://stackblitz.com/edit/typescript-lh7dxb?file=index.ts

I didn't test this thoroughly, but hopefully you get the idea.

matt-sweda-calder commented 1 day ago

One key piece of info missing from this thread is how to properly parse the "format" value of text nodes.

@AlessioGr linked a good example above, so look at that.

Maybe this is common knowledge, but it took me a second to realize that for TextNodes, "format" is a bitmask. Meaning "bold" is represented as a 1 in the rightmost bit (e.g. ...0001 in binary), italic in the second-to-rightmost bit (e.g. ...0010, etc. Applying both bold and italic together gives you ...0011 in binary, which evaluates to 3.

Other combinations of text node formats are derived the same way.

Hope this helps someone else!