facebook / lexical

Lexical is an extensible text editor framework that provides excellent reliability, accessibility and performance.
https://lexical.dev
MIT License
17.5k stars 1.45k forks source link

Allow passing in a custom Set of tags to ignore in $generateNodesFromDOM #4178

Closed amanharwara closed 1 year ago

amanharwara commented 1 year ago

Currently there is no way of modifying the HTML tags that the $generateNodesFromDOM function ignores without maintaining a copy of both the $generateNodesFromDOM and $createNodesFromDOM functions in our own codebase. This PR adds a way to pass in a custom Set of HTML tags that get ignored when generating nodes from DOM.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
lexical ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Mar 28, 2023 at 4:44PM (UTC)
lexical-playground ❌ Failed (Inspect) Mar 28, 2023 at 4:44PM (UTC)
facebook-github-bot commented 1 year ago

Hi @amanharwara!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

fantactuka commented 1 year ago

Could you explain your use case a bit more? I think it's possible to ignore tags now by adding those tags to importDOM() rules with high priority and returning { node: null }.

acywatson commented 1 year ago

Could you explain your use case a bit more? I think it's possible to ignore tags now by adding those tags to importDOM() rules with high priority and returning { node: null }.

I think it will still process the children though, which might not be what they want. Another option might be to remove the node from the DOM to avoid further child processing, if that's the case (untested).

amanharwara commented 1 year ago

Could you explain your use case a bit more? I think it's possible to ignore tags now by adding those tags to importDOM() rules with high priority and returning { node: null }.

We're working on a clipper extension that would allow users to clip a full page for example, and on some pages that include inline <script>s, those seem to get converted to TextNodes.

Admittedly haven't tried the importDOM method because I'm not sure what node that would go on? Since I can't modify the importDOM method of the TextNode itself, I'm guessing I'd have to create a custom node like IgnoreScriptsNode that extends TextNode and has an importDOM to ignore script tags?

Not entirely opposed to that idea, but we're also using $generateNodesFromDOM in another place where we don't care about ignoring those, so having that as a function argument feels better.

I think it will still process the children though, which might not be what they want. Another option might be to remove the node from the DOM to avoid further child processing, if that's the case (untested).

In our particular usecase I don't think child processing would be much of an issue considering <script> tags can only really contain text.

facebook-github-bot commented 1 year ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

facebook-github-bot commented 1 year ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

acywatson commented 1 year ago

Admittedly haven't tried the importDOM method because I'm not sure what node that would go on? Since I can't modify the importDOM method of the TextNode itself, I'm guessing I'd have to create a custom node like IgnoreScriptsNode that extends TextNode and has an importDOM to ignore script tags?

Not entirely opposed to that idea, but we're also using $generateNodesFromDOM in another place where we don't care about ignoring those, so having that as a function argument feels better.

Yea you could do it with importDOM as you described. I wonder if we could have an optional importDOM prop in the editor config that allows you to register conversions without having to create or override a node. The current approach is kind of unwieldy.

On the other hand, I don't have a huge problem with this API, but I might say that, if we go ahead with this, we should add an options bag instead and just make ignoreTags one property on that just to sort of put a lid on the proliferation of args here.

amanharwara commented 1 year ago

So today I tried out the importDOM method by creating a IgnoreScriptNode with this importDOM:

static importDOM(): DOMConversionMap<HTMLElement> | null {
    return {
        script: () => {
            return {
                conversion: (element) => {
                    console.log(element)
                    return null
                },
                priority: 4,
            }
        },
    }
}

It doesn't seem to actually ignore those though. I see the elements in the console log, but they still get converted to TextNodes.

For context, this is used with a headless editor like this:

const editor = createHeadlessEditor({
  namespace: 'BlocksEditor',
  theme: BlocksEditorTheme,
  editable: false,
  onError: (error: Error) => console.error(error),
  nodes: [...BlockEditorNodes, IgnoreScriptNode],
})

editor.update(() => {
  const parser = new DOMParser()
  const dom = parser.parseFromString(html, 'text/html')
  const nodesToInsert = $generateNodesFromDOM(editor, dom).map((node) => {
    const type = node.getType()

    // Wrap text & link nodes with paragraph since they can't
    // be top-level nodes in Lexical
    if (type === 'text' || type === 'link') {
      const paragraphNode = $createParagraphNode()
      paragraphNode.append(node)
      return paragraphNode
    }

    return node
  })
  const selection = $createRangeSelection()
  selection.insertNodes(nodesToInsert)
})

Is there something incorrect on my end? If not, then I think it makes sense to go forward with the changes to $generateNodesFromDOM...

acywatson commented 1 year ago

It doesn't seem to actually ignore those though. I see the elements in the console log, but they still get converted to >TextNodes.

Yea I suspect there's actually a DOM TextNode nestled in that Script node, which is why I brought up the thing about the children. I'm fine with the change you're proposing, just make the arg like:

options: { ignoreTags: string[] }
fantactuka commented 1 year ago

Should we just always ignore script tags (similar to style)

acywatson commented 1 year ago

Should we just always ignore script tags (similar to style)

Yea probably, but as soon as we do someone will have a use case for converting script tags, I guess.

amanharwara commented 1 year ago

Should we just always ignore script tags (similar to style)

That would be enough for our usecase, but as @acywatson mentioned, there might be someone who will want to convert script tags or maybe ignore some other tag in which case these changes allow for that.

I think it might make sense to do both - ignore both script and style tags by default, and allow users to not ignore those if they want to.

acywatson commented 1 year ago

@zurfyx This is going to conflict pretty hard with the changes I made to this API in

https://github.com/facebook/lexical/pull/4253

I think ignoreTags can just be added to the options there.

Wrt to your other comments, I don't have strong opinions. The more I think about it, the full override (rather than extend) is probably fine for this situation.

acywatson commented 1 year ago

We now ignore script tags by default:

4249

Following up on the generic ignore solution in:

4253