dedupeio / dedupe-examples

:id: Examples for using the dedupe library
MIT License
404 stars 216 forks source link

Documentation on blocks is confusing / inaccurate #101

Closed jdotjdot closed 4 years ago

jdotjdot commented 4 years ago

Thanks for this awesome project! We're really excited to start using it.

That said, we've had a lot of trouble understanding the documentation. In particular for blocks, it appears to be a combination of missing information, out of date, and internally inconsistent. The clearest example of this is in the documentation for Gazetteer (link). We're using Gazetteer on a large quantity of data, so we're trying to use matchBlocks() instead of match(), but even with reading the source code, we haven't been able to figure out how to properly set up the blocks. As an initial example, the Gazetteer object in the docs actually defines matchBlocks() twice, with different documentation that includes different examples.

First definition, with the final example block structure as follows (the example itself is also syntactically invalid, the parens don't match up): image

The second redefinition doesn't have a direct page link, so go here and then scroll up slightly to see it. Final example block structure it gives: image

StaticGazetteer has the same issue, presumably due to inheritance.

Can you clarify what the right way to do blocking is? Thanks so much!

jdotjdot commented 4 years ago

Realized I filed this issue on the wrong repo; closing in favor of https://github.com/dedupeio/dedupe/issues/789.