fix stripping html to correctly handle byte offsets

markdown_html_finder returns offsets in terms of bytes, but our Python code works in terms of unicode. 🏷️ is 11 characters, but 16 bytes. So if we have unicode in our PR body, stripping html characters doesn't work correctly.

The fix is to correctly handle byte offsets and unicode offsets. We must convert to bytes to accept the offsets from markdown_html_finder. But we must use unicode to parse HTML comments from those HTML snippets.

related #800

chdsbd / kodiak

fix stripping html to correctly handle byte offsets #805