facebook / lexical

Lexical is an extensible text editor framework that provides excellent reliability, accessibility and performance.
https://lexical.dev
MIT License
19.7k stars 1.67k forks source link

Limitation/usecase discussion: playground hangs when converting to Markdown with an uploaded image of 1.1 megabyte because of doing base64 encoding on the UI thread and not in a WebWorker #6256

Open vadimkantorov opened 5 months ago

vadimkantorov commented 5 months ago
  1. Go to https://playground.lexical.dev/
  2. Use toolbar to open Insert -> Image -> File
  3. select from local file system this GIF file (1.11Mb) cat-typing 29244fe9
  4. It pastes correctly. Now click on the M↓ button in the bottom-right image
  5. Chrome browser window hangs

Another related bug:

When using another image (16.3Kb) yellow-flower a2a7c7a2, the window hangs for a bit, and then a giant base64 datauri is displayed in Markdown. So I assume that the GIF image is much larger, so encoding it to Base64 on the UI thread blocks it. But even encoding the small image to Base64 on the UI thread blocks it for a bit. I think that this encoding should be done in a WebWorker, and for regular display there should be an option which avoids encoding to base64 altogether and uses some pseudo-URLs to refer to the selected images: https://github.com/facebook/lexical/issues/6255 (and at the end there should be two options of getting markdown: using pseudo-URLs and using the datauri URLs, this second way at the very least should be async, but this still can block the UI when executed, so this encoding should be done in a WebWorker). The WebWorker spawn/management/communication can even be upheld on the user, as long as convenient data structures and examples are provided (e.g. instead of markdown, we are given a capsule object using unique pseudo-urls and image blobs which needs to be materialized to get the final markdown, calling this final materialization method can be upheld onto the user who would spawn a webworker, marshal this object there, run the base64 encoding there and marshal the final string back (if they are working with images and do not want to block the UI, otherwise they can call the materialization method directly on the regular UI thread)). A worse alternative is writing the base64 encoding in a cooperative ways which yields to the browser every few milliseconds to process the events, but this would still result in mini-hangs and inferior user experience.

Images of 1Mb and even several Mbs are quite common these days, so support of them without hangs might be the expected default

Also, super-large base64/imageuris are not practical in Markdown editing, so using unique pseudo-image-URLs for images for working with Markdown and then materializing the full Markdown later (if actually needed, as often we'd like to upload the images to some server directory instead) is probably a reasonable design, as I also propose/discuss in:

etrepum commented 5 months ago

I think if you were implementing this in production you'd be using object urls, uploading to a server, and switching to that. The components in the playground are more for demonstration purposes than anything else.

Not saying it isn't something that could or should be improved, just that these components are not published in a ready to use fashion for a reason, and that it's not really a bug that these playground components are not all production-ready because that's not their purpose.

vadimkantorov commented 5 months ago

Yeah, I realize that playground components are not meant as production-ready (I've updated the title to reflect this more accuractely). But maybe these usecases could also inform design/recommendations of some abstractions / hooks and also inform users on existing limitations (e.g. 1.1Mb images are not that uncommon even as a proof-of-concept/demo usecase, and even adding the 16Kb image produces a noticeable mini-hang)

etrepum commented 5 months ago

Once you do have a 1.1mb image encoded as a base64 data url, then you have a 1.47mb json document (and if you copy and paste it then it multiplies very quickly) which is going to have other performance issues. I don't think deferring to a worker in this case is going to be that helpful. It's really just a bad approach all around, but it is a reasonable demo of the functionality. Maybe the best fix would be to add comments to the code explaining how it could be done better.

vadimkantorov commented 5 months ago

I agree that having inline large images' datauri is not practical in many scenarios, but sometimes keeping such datauris inline is beneficial because it allows to keep Markdown files fully self-contained in storage and in server-side static site generation, removes the need for asset deletion at document deletion etc. With that said, I think for markdown editing these datauris are always not needed (and better hidden/replaced with object URLs or pseudo-URLS), so I filed for discussion https://github.com/facebook/lexical/issues/6255. Ideally I think, there should be two methods of getting Markdown: with resources inlined and resourced replaced by URLs. During editing, one can work with the latter one, while for storage/rendering one can use the first mode (shelving the image encoding onto a webworker) if desired.

Some UI hangs are noticeable even for the 16.3Kb file.

I agree that for a demo it is reasonable and that some notes in docs/playground default document would be great. The playground just looks so slick that one could think that it's almost production-ready :)

etrepum commented 5 months ago

If you look at how github does it with their markdown/preview comment interface you just get the full URLs as soon as they're uploaded, it doesn't maintain any sort of separate "resource URL" abstraction. TBH I think it would be pretty awkward to implement a sort of resource URL strategy because it needs to be done outside of the editor, there's not really a place to put document level metadata other than to use something like $getEditor() with your own module scoped WeakMap of Editor -> Metadata, and you'd probably want to do some sort of garbage collection for the resources in metadata when they're no longer referenced by either the current or pending editor state. I doubt that sort of code would end up in the playground, but it would perhaps make for a good third-party plug-in if all of the ugly parts could be sufficiently hidden from users.

(edit: I guess the GC wouldn't really work if the history plugin was around too, so maybe you'd just need it to grow as long as the editor exists)

vadimkantorov commented 5 months ago

I wonder if we can indeed always generate object URLs at image upload (essentially storing all these images in some browser-provided page's global cache), and let the user call on them releaseUrl/revoke if they want any cleanups logic...

Also, maybe all this can be somehow handled by a user-provided hook which can upload the file to the server and get the url or return an object URL. If that's enough, maybe providing this extension point in Playground would solve this.

And then we can provide a helper method which (given a markdown string) either inlines all markdown images into base64, then this helper method can be executed in a WebWorker if desired.

vadimkantorov commented 5 months ago

Since my other issue https://github.com/facebook/lexical/issues/6255 was closed in favor of this one, pasting here a more generic discussion on providing hooks for pre/post-processing URLs for image display in Markdown documents:


I also found these bugs when trying out image pasting functionality in the Playground:

Context: I'm building a simplistic single-file 500LOC analog of https://mdxeditor.dev/ for RTF editing of Markdown files stored in GitHub repos, manipulated with GitHub API, supporting uploading/displaying images (available at https://vadimkantorov.github.io/moncms).