crosstype / node-html-markdown

Fast HTML to markdown converter for NodeJS or the browser
163 stars 29 forks source link

Handling newlines before `<em>` tags #34

Open josh- opened 2 years ago

josh- commented 2 years ago

When I was working on @types/node using the script which generates new type definitions from the node docs, I noticed that node-html-markdown doesn't appear to replace newlines with whitespace when they occur immediately before an <em> tag, which can result in potentially broken markdown.

Here is an example test:

test(`newline before emphasis`, () => {
  const res = translate(
    `The contents of the newly created <code>Buffer</code> are unknown and\n<em>may contain sensitive data</em>.`
  );
  const exp = `The contents of the newly created \`Buffer\` are unknown and _may contain sensitive data_.`;
  expect(res).toBe(exp);
});
expect(received).toBe(expected) // Object.is equality

    Expected: "The contents of the newly created `Buffer` are unknown and _may contain sensitive data_."
    Received: "The contents of the newly created `Buffer` are unknown and_may contain sensitive data_."

      49 |     );
      50 |     const exp = `The contents of the newly created \`Buffer\` are unknown and _may contain sensitive data_.`;
    > 51 |     expect(res).toBe(exp);
         |                 ^
      52 |   });
      53 |
      54 |   test(`test2`, () => {

      at Object.<anonymous> (test/default-tags.test.ts:51:17)

Test Suites: 1 failed, 4 passed, 5 total
Tests:       1 failed, 63 passed, 64 total
Snapshots:   0 total
Time:        5.672 s

I was wondering whether this was a bug or was intentional? Thanks!

nonara commented 2 years ago

Thanks for the report, Josh! I apologize for the delay. I've been pretty tied up in recent months.

I agree that this shouldn't happen. Whitespace should be replaced by a single space, and broken markdown is never good.

I've got a bit of catching up to do, but I will add this to the list and take a look as soon as I can. If you (or anyone else) is up to it in the interim, PRs are welcome!

Thanks again!