adobe / da-live

Dark Alley is a research project
https://da.live
Apache License 2.0
7 stars 11 forks source link

Intercept base64 images on paste into editor #169

Open auniverseaway opened 1 month ago

auniverseaway commented 1 month ago

As an author pasting content from various sources, I would like any embedded image data (typically base64 encoded) to be replaced with proper image references so that the server does not crash when trying to push the data to da-collab.

Context

Our current service provider has a hard limit on how much we can push into the collaboration websocket. I believe this is 1MB. We need a ProseMirror plugin to find anything base64 encoded inside the pasted html and re-route that content to proper files that are siblings to the page.

Steps to reproduce

  1. Edit a page via da.live/edit/.....
  2. Open a Word Online document that has an image at least 1 MB on it.
  3. ctrl + A to select all on the Word Online doc
  4. crtl + V to paste into your da.live page (you should see your image)
  5. Try to preview... the paper airplane animation loops but no preview happens.
  6. inspecting the element shows "img src="data:image/jpeg;base64..."

Criteria of acceptance

  1. base64 binary data inside pasted html is converted to sibling files of the current page.
  2. Collab does not crash on paste of html with embedded base64 data.
  3. Image uploads are uniquely identified to avoid overwriting anything existing.

Other considerations

  1. I have no idea if we can get the mimetype out of the base64 content.
  2. The number 1 use case here is pasting from Word so it should be easy to replicate.
helms-charity commented 1 month ago

@auniverseaway do you have a sample page with embedded image data on it?

Actually I was able to create one from https://elmah.io/tools/base64-image-encoder/ and paste the HTML on the doc, and surprisingly it rendered. https://main--charity-da-fun--helms-charity.hlx.page/ Let me know if this isn't the use-case.

bosschaert commented 1 month ago

@helms-charity while we should externalize all images, it only actually currently fails when the content is larger than 1MB. The easiest way to reproduce it would be to copy a large image from a document in Word/Sharepoint into DA.