Many of the citations that I've seen don't have any useful associated information with them. For example,
here is an example of a California that cites the code of civil procedure (in the bottom right corner). The relevant text is Code of Civil Procedure, § 706.124. Because that specific regex string isn't in reporters_db, eyecite only captures that the § character is a chapter of some sort, but only returns UnknownCitation('§', metadata=CitationBase.Metadata(parenthetical=None)), which by default doesn't give any of the relevant information surrounding the citation. That sort of info is important to know if we want to suggest people remove citations from forms.
My best suggestion for moving forward is to use the index attribute of the object to get the position in the original text, and grab at least 10 tokens before and after the symbol for context, which we can print when we print the citation. The difficulty is recreating the tokenization process (I think they include whitespace as separate tokens).
Many of the citations that I've seen don't have any useful associated information with them. For example, here is an example of a California that cites the code of civil procedure (in the bottom right corner). The relevant text is
Code of Civil Procedure, § 706.124
. Because that specific regex string isn't in reporters_db, eyecite only captures that the§
character is a chapter of some sort, but only returnsUnknownCitation('§', metadata=CitationBase.Metadata(parenthetical=None))
, which by default doesn't give any of the relevant information surrounding the citation. That sort of info is important to know if we want to suggest people remove citations from forms.My best suggestion for moving forward is to use the
index
attribute of the object to get the position in the original text, and grab at least 10 tokens before and after the symbol for context, which we can print when we print the citation. The difficulty is recreating the tokenization process (I think they include whitespace as separate tokens).