Properly handle image uploads through TinyMCE

batpad commented 2 years ago

Following on from #2438 -

The last update we did with the Rich Text Editor implementation inadvertently gave users the ability to add images to their posts within rich text areas.

Since this was not a properly planned feature implementation, it relies on the default TinyMCE handling of image data, which is to embed the raw base64 encoded image data into the HTML. This is likely not the best solution going forward as we are storing rather large image blobs in the HTML and consequently in the database. Ideally, we would not do this, and upload the images somewhere else and include them with a normal <img src="https://path.to.image/foo.jpg"> type tag.

This would require us to handle file uploads on the backend, store them in Azure Blob Storage, and return a link to be embedded in the HTML.

There could be a few different ways to do this:

The django-tinymce project, and other Django TinyMCE integrations tend to use django-filebrowser to have a full file management interface in the Django Admin for user files, that then integrates with TinyMCE, giving users the ability to link and embed files they upload through the file browser. See https://django-tinymce.readthedocs.io/en/latest/ . This seems overkill. Django-filebrowser has a further dependency on Grappelli which can have lots of consequences for the Admin as a whole, and getting django-filebrowser to work with Azure Blob Storage as a storage backend seems like that could be a further difficulty. I don't think I would recommend going down this path, unless we want to consider combining it with a full-fledged feature for Admin users to manage file uploads in the Admin. I have used grapelli + filebrowser + integration with tinymce / ckeditor a fair bit in the past, and while it had lots of nice things, I don't think I would recommend that approach here. Happy to discuss more.
Create a simple image upload handler backend route, which saves the image into Azure Blob Storage and returns the URL, which TinyMCE can add as the src for an image, as per: https://www.tiny.cloud/docs-4x/general-configuration-guide/upload-images/ - the examples are PHP but it should not be too bad to do something similar with django on the backend. This would likely be the simplest solution - so a user would upload a file in TinyMCE, it would call our backend upload handler which will save it in Blob Storage, return the URL, which TinyMCE will add as the image src. A problem with this approach is that it is rather hard to cleanup. If a user for example uploads an image, we save it in Blob Storage, and then decides to remove the image from their HTML, the image will continue to exist on Blob Storage, and these images will be difficult to clean-up in an automated fashion. The other drawback of this (but that will likely exist with other approaches as well), is that handling permissions on the level of individual images will be HARD. So the content and HTML will be served according to permissions, but if someone gets a link to the actual images, they will be public. This will likely be very hard to avoid.
The third approach that I very much like, but is not the best UX for users: Create separate models in the Admin for users to upload images for rich text area fields - so say Emergency Summary would have Emergency Summary Images, where a user can upload individual images in the admin, that would get attached to the summary, or other rich text field. Ideally, the images would then all be displayed in a uniform manner - say thumbnails w/ an image gallery at the bottom of the text. OR, allow users to give images names, so they call an image say img1 and then can insert that image in the text by adding something like {{ img1 }} which would insert the img1 where that is added. This method keeps the images better organized, associated to events as well as potentially allows us to better handle permission issues. This will mean adding a LOT of image models if we need to add one for each rich text editor field. This can get a bit unwieldy. And it's not going to be as "easy" as it is currently of just dragging an image into the text field.

I would propose to go with the second approach outlined above - i.e. create a simple custom image handler route on the backend that uploads to Blob Storage and configure TinyMCE to use it. Whilst saying this, I think there is a lot to be concerned about when allowing users to upload arbitrary images to text fields, some of those are:

We potentially will need to get into automatic resizing of images, etc. - or figure out good ways to place limits on what users can upload.
The text data being pulled from the API will now include not only HTML formatting but references to images - this will start becoming harder and harder for API consumers to display in contexts other than the GO website since in TinyMCE, the styling data is very coupled to the markup.

In general, introducing WYSIWYG editors like TinyMCE can have rather large implications and a large list of pros and cons that are likely out of the scope of this particular ticket, but that might inform decisions we take here. Looking forward to talking more and weighing our long-term options around rich text editing when we meet in person.

cc @tovari @szabozoltan69 @frozenhelium @thenav56

szabozoltan69 commented 2 years ago

8 days ago (on 15th Aug) I found 8 entities in production which used base64 images, and replaced them with the uploaded image. Since then these queries give no result; I will run them time to time to check if somebody uses pasted images:

select length(description) from api_fieldreport where description like '%base64%' order by 1 desc;
select length(situational_overview) from flash_update_flashupdate where situational_overview like '%base64%' order by 1 desc;
select id, length(summary) from api_event where summary like '%base64%' order by 2 desc;

Of course, a final solution (i.e. the 2nd one above) would be great.

thenav56 commented 2 years ago

For third point, some additional things we can do.

We preproccess the text field and replace base64 images with image names after saving them to blob storage.
- If the image name is removed from the text then we can mark it as deleted and run a garbage collector which will delete the files.
We share image URLs through API instead of saving them to HTML as we can change the URL domain (storage name) or define permissions (like signed URL for storage) in the future. (This is something that can cause issues in the future.)
- Client can then replace the image name with URLs.
We can assign the images with the related model.
For a single model (eg: AdminFileGallery) for storing we can use GenericRelations
- One con will be that we need to use prefetch_related_objects to pull related data.
- Or we define an M2M for each model and AdminFileGallery.

IFRCGo / go-frontend

Properly handle image uploads through TinyMCE #2460