Closed ErisDS closed 5 years ago
I've been working through iterations to improve the logic and handle more and more cases, and I'm now down to just a handful:
All of these cases seem pretty edge with GSuite's weird structure, but the first one - the DOM parsing of <b><em>foo</em>bar</b>
, i.e. nested markup with no valid sections, is a real concern.
I think, to solve this case I need to keep some state around, as to whether or not a valid section has been found anywhere in the structure.
Walking the DOM feels like the correct approach here, even though it's a huge change. However there might be an alternative approach that would involve improving the logic of detectRootElement
to locate and throw away wrapper elements. Or a hybrid approach combining DOM walking & the original iterations.
I'm a little hesitant to spend too much more time on this without some input, maybe from @bantic on whether you think that DOM walking could ever be a viable approach, or if I've overcomplicated this.
Haven't had chance to review the detailed feedback yet, but just to get the conversation moving on lemme reply to the main point here:
@ErisDS So I suggest you think of the HTML parse as a "paste" HTML parser. I think your usage of it for legacy document parsing is suitable, but we always presumed that the HTML paste parser would be inappropriate for any number of specific cases.
I experience the same bugs and issues on paste - if you select a large amount of content from E.g. medium and paste it into Ghost's editor, you end up with garbled merged paragraphs and weird styling.
However, I am definitely trying to kill 2 birds with one stone. If improving the parser means that both pastes and legacy import support are improved, then surely that's just a win-win?
In terms of my intent, medium content is a great example of what I'm trying to get to work - I'll expand out #648 with a non-minimal example.
Every iteration I've made since my original PR has been to try to get more of the existing test cases to work :)
Going to close this PR. Will add an explanation and describe a new approach in the original issue #648.
closes https://github.com/bustle/mobiledoc-kit/issues/648
I haven't figured out how to write tests for mobiledoc-kit yet. I've done rudimentary testing by writing a test for the 4 cases I outlined in #648 in the html-to-mobiledoc kit lib here.
I need to do much more extensive testing around text nodes, and what happens if the DOM is empty, etc etc, but I think this approach is sound.
Will look into writing native tests and wider examples asap.