Fixed side effect from invocation of cleaner in unfluff.lazy

franza commented 10 years ago

I was sure that I checked that for #16 but it seems that I missed that.

cleaner mutates original doc object so doc needs to be re-calculated. So right now after cleaner is applied we will suffer from side effect. Consider next example:

[fs, unfluff] = ['fs', 'unfluff'].map require

html = fs.readFileSync('test_tags_kexp.html', 'utf8')

doc1 = unfluff.lazy html
doc2 = unfluff.lazy html

console.log 'tags1: ', doc1.tags() # ['Dennis Morton', 'film', 'kusp film review', 'Stand Up Guys']
console.log 'text1: ', doc1.text()

console.log 'text2: ', doc2.text()
console.log 'tags2: ', doc2.tags() # [ ]

Using this code over test_tags_kexp.html fixture we will have different results for tags() since cleaner is called inside text(). So when cleaner is called we need to reload document. Besides, I added some refactoring.

ageitgey commented 10 years ago

Thanks for catching this! I'll take a look in detail when I have some time this weekend.

franza commented 10 years ago

Sure. If you have ideas how we can avoid reloading document bring it up.

ageitgey commented 10 years ago

Sorry, I've been lax on reviewing this. Still plan to get to this very soon. Thanks!

ageitgey / node-unfluff

Fixed side effect from invocation of cleaner in unfluff.lazy #21