alerque / stack-verse-mapper

Index Bible verse references in Stack Exchange data dumps.
https://alerque.github.io/stack-verse-mapper
GNU Lesser General Public License v3.0
7 stars 0 forks source link

Strip out markup before looking for references #8

Closed alerque closed 8 years ago

alerque commented 8 years ago

URLs have some crazy formatting and are giving a lot of false positives. I think the useful data is going to be verse references which are named in the body of a post. Instead of feeding the whole rendered HTML post into the reference parser we strip the tags and flatten it to just a text post.

curiousdannii commented 8 years ago

The npm html-to-test package works great. I'd use it with these options

{
    wordwrap: false,
    ignoreHref: true,
}