freelawproject / eyecite

Find legal citations in any block of text
https://freelawproject.github.io/eyecite/
BSD 2-Clause "Simplified" License
123 stars 32 forks source link

More tutorial improvements re #95 #100

Closed mattdahl closed 3 years ago

mattdahl commented 3 years ago

This PR transforms our tutorial into a Jupyter notebook, which we can link to from the README and should render pleasantly from within GitHub. Re #95, this change should (1) provide an example of how to use eyecite with a real document containing thousands of real citations and (2) make clear that all of the tutorial code is run-able by the reader.

The pseudocode about passing a custom resolution to resolve_citations() has been removed entirely, as we already explain the same idea here: https://github.com/freelawproject/eyecite#resolving-citations.

mlissner commented 3 years ago

I don't know if you still have energy for this, so just merging since it's another nice improvement, but I had a few questions when reading through this:

Anyway, my policy on documentation is to merge any form of progress, so this is merged, but those were the spots I got caught. Is it easy to edit a Python notebook?

mattdahl commented 3 years ago

I still have some energy lol, and I agree that making the second and third change would be good. For the first one, it's my understanding that the pre-compiling happens automatically, right? I.e., the first time the tokenizer is instantiated, it pre-compiles all the regexes and dumps them into the cache folder. Then subsequent calls to get_citations() just use that without having to re-compile anything. I can state this explicitly, but there's nothing the user has to do, right?

mlissner commented 3 years ago

Yeah, I think that's right, but I haven't witnessed it myself yet. If so, we should state it though. Maybe @jcushman can confirm.