facebook / lexical

Lexical is an extensible text editor framework that provides excellent reliability, accessibility and performance.
https://lexical.dev
MIT License
19.71k stars 1.68k forks source link

Feature: Minified Version of JSON Code #4104

Open quick007 opened 1 year ago

quick007 commented 1 year ago

As of now, the amount of data that is stored for the lexical editor is quite a lot. For a small amount of text, the resulting JSON files can be massive, and it only seems to grow exponentially. Looking at the output, there are a lot of useless defaults being used. Is there any way those could be stripped out and put back in with a function?

As of now we're using binary json in our database, which is helping to reduce this, but I feel like that's not a full solution

Alex-Github-Account commented 1 year ago

why this issue is important: people pay per kb of data read/written from DB in the cloud and bloated JSON literally costs money. Instead of reading 'some text {BOLD:' some bold text'}', there a bill for reading few kilobytes of useless (because it 99% consists of default values repeated over and over) JSON comes from the cloud provider.

thegreatercurve commented 1 year ago

It's an important issue, and I think we already have the APIs to help address it.

We were very careful to not add anything unnecessary to the editor state other than the tree-like structure of the nodes. The data stored for each node is then just a reflection of what gets added via the exportJSON method on each individual node class.

To reduce any unnecessary default values, you can either amend the exportJSON method for any of your custom nodes classes, use the node replacement API to overwrite and simplify the same method on default nodes, or run your own custom script before you save to your DB which just traverses and strips out any unnecessary values.

Alex-Github-Account commented 1 year ago

It's an important issue, and I think we already have the APIs to help address it.

We were very careful to not add anything unnecessary to the editor state other than the tree-like structure of the nodes. The data stored for each node is then just a reflection of what gets added via the exportJSON method on each individual node class.

To reduce any unnecessary default values, you can either amend the exportJSON method for any of your custom nodes classes, use the node replacement API to overwrite and simplify the same method on default nodes, or run your own custom script before you save to your DB which just traverses and strips out any unnecessary values.

not exporting default values in first place !== minification.

not exporting == realible method of achieving 100% working and compatible output.
User-made minifications == reverse-engineered zoo of solutions to a problem that should not exist in first place, and leads to to bugs upon re-import, sometimes serious like that one that was 'fixed' today, where custom-minified JSON was causing infinite loop (browser freeze) upon import.

quick007 commented 1 year ago

I agree. I still think a real minified version would be great. I'm going to work on a function that strips some of the default values on its way to the db and do the opposite on the way back. I realize this isn't actual modification but I feel like I can do my best to avoid breaking changes this way. I'll share it here if my approach doesn't end up being a mess. Let's hope there's an official version eventually!

acywatson commented 1 year ago

I'm going to leave this one open for the moment, because I actually think that reducing the out-of-the-box footprint of Lexical JSON is a goal worth pursuing and potentially do-able. One initial concern I have with this is collaborative editing. If the solution is just leaving out any properties that have the default value, then I'm not sure a CRDT could reconcile the updates correctly. I think all the properties need to exist, but I could be wrong about that. If the solution is to change the properties to have short keys, that's a significant breaking change for the core node serialization schema.

I need to look into this a but more, but in the meantime, I would be interested to see the solution you develop.

I realize this isn't actual modification but I feel like I can do my best to avoid breaking changes this way

I think this is a good approach - there are many places at Meta where we do some sort of intermediate transform between the editor and the database. For us that's ended up usually being a pretty flexible solution without being terribly difficult to implement.

quick007 commented 1 year ago

Great, thanks for reopening!

fedemartinm commented 1 year ago

The congruence between the exportJSON output and the state managed by lexical seems to be consistent and effective.

It is possible to efficiently access the nodes and minify them without relying on lexical for this responsibility. Simply iterate through the nodes and write a minifier/unminifier for each node type and version.

The size produced by exportJSON has also been an issue for me, and I resolved it with this approach. It is safe, produces the same result without data loss, and offers a good minification ratio. Custom minifier on CodeSandbox

acywatson commented 1 year ago

The congruence between the exportJSON output and the state managed by lexical seems to be consistent and effective.

It is possible to efficiently access the nodes and minify them without relying on lexical for this responsibility. Simply iterate through the nodes and write a minifier/unminifier for each node type and version.

The size produced by exportJSON has also been an issue for me, and I resolved it with this approach. It is safe, produces the same result without data loss, and offers a good minification ratio. Custom minifier on CodeSandbox

Yes, this is a perfectly valid approach.

DaveyEdwards commented 3 months ago

I was feeling the downstream effects of this where I was allowing code to be pasted in to lexical and since it was a few hundred lines of code it produced massive amounts of json (custom formatting made it much larger). My db was blocking the save due to it being too large to fit in a mysql text column (which is 65k characters), so my workaround was to store the lexical output in my Google Cloud Storage bucket. Since this data is tiny compared to large images/videos it shouldn't be a problem anymore. I know it doesn't solve the issue, but it might help others when running into this.