walk sections instead of iterating

bustle / mobiledoc-kit

A toolkit for building WYSIWYG editors with Mobiledoc

https://bustle.github.io/mobiledoc-kit/demo/

MIT License

1.55k stars 151 forks source link

walk sections instead of iterating #651

Closed ErisDS closed 5 years ago

ErisDS commented 6 years ago

closes https://github.com/bustle/mobiledoc-kit/issues/648

I haven't figured out how to write tests for mobiledoc-kit yet. I've done rudimentary testing by writing a test for the 4 cases I outlined in #648 in the html-to-mobiledoc kit lib here.

I need to do much more extensive testing around text nodes, and what happens if the DOM is empty, etc etc, but I think this approach is sound.

Will look into writing native tests and wider examples asap.

Use a dom walker
Look for valid sections or text nodes
Parse the sections that are found

ErisDS commented 6 years ago

I've been working through iterations to improve the logic and handle more and more cases, and I'm now down to just a handful:

All of these cases seem pretty edge with GSuite's weird structure, but the first one - the DOM parsing of <b><em>foo</em>bar</b>, i.e. nested markup with no valid sections, is a real concern.

I think, to solve this case I need to keep some state around, as to whether or not a valid section has been found anywhere in the structure.

Walking the DOM feels like the correct approach here, even though it's a huge change. However there might be an alternative approach that would involve improving the logic of detectRootElement to locate and throw away wrapper elements. Or a hybrid approach combining DOM walking & the original iterations.

I'm a little hesitant to spend too much more time on this without some input, maybe from @bantic on whether you think that DOM walking could ever be a viable approach, or if I've overcomplicated this.

ErisDS commented 6 years ago

Haven't had chance to review the detailed feedback yet, but just to get the conversation moving on lemme reply to the main point here:

@ErisDS So I suggest you think of the HTML parse as a "paste" HTML parser. I think your usage of it for legacy document parsing is suitable, but we always presumed that the HTML paste parser would be inappropriate for any number of specific cases.

I experience the same bugs and issues on paste - if you select a large amount of content from E.g. medium and paste it into Ghost's editor, you end up with garbled merged paragraphs and weird styling.

However, I am definitely trying to kill 2 birds with one stone. If improving the parser means that both pastes and legacy import support are improved, then surely that's just a win-win?

In terms of my intent, medium content is a great example of what I'm trying to get to work - I'll expand out #648 with a non-minimal example.

Every iteration I've made since my original PR has been to try to get more of the existing test cases to work :)

ErisDS commented 5 years ago

Going to close this PR. Will add an explanation and describe a new approach in the original issue #648.