Closed adelevie closed 10 years ago
So matches in my experience so far have been a bit over-inclusive. The excerpt text is often many sentences long past the full citation.
excerpt
.The way the parse works is to split the document into words, and then pick every reporter element and apply heuristics forward and back, snipping out meaningful elements as it goes. The full citation is not grabbed in a single regexp, but you can get any level of precision out of it that you need. The index of the actual start of the plaintiff/defendant string is probably just not being saved off in the citation object. With a bit of tweaking it can be brought right.
I'm busy for the next couple of days, and I'll be out of Net contact during the weekend. Should be able to work on fixing it up next week, though.
Merging @fbennet's pull request https://github.com/adelevie/walverine/pull/3 with a small modification to include the
match
attribute in the response object fromget_citations()
.Fixes https://github.com/adelevie/walverine/issues/2