freelawproject / reporters-db

A database of court reporters, tests and other experiments
BSD 2-Clause "Simplified" License
93 stars 34 forks source link

Add per-volume dates to every volume of every edition of every reporter #19

Open mlissner opened 4 years ago

mlissner commented 4 years ago

One example could be something like:

{ 
"A.": [
        {
            "cite_type": "state_regional",
            "editions": {
                "A.": {
                    "end": "1938-12-31T00:00:00",
                    "start": "1885-01-01T00:00:00",
                    "volumes: {
                        1:  {"start": "1885-01-01T00.00.00", "end": 1885-06-01T00.00.00},
                        2:  {"start": "1885-01-01T00.00.00", "end": 1885-06-01T00.00.00}
                    }
                },
                "A.2d": {
                    "end": "2010-12-31T00:00:00",
                    "start": "1938-01-01T00:00:00"
                },
                "A.3d": {
                    "end": null,
                    "start": "2010-01-01T00:00:00"
                }
            }
        }
    ]
}

But that'd create a monster of a JSON file.

mlissner commented 4 years ago

I forgot to mention why these are useful. In https://github.com/freelawproject/courtlistener/issues/299, we've identified that we want to start finding citations that lack page numbers, like, 442 U.S. ___. If we want to do that, we won't be able to rely on the citation to look them up and instead we'll have the volume number, reporter abbreviation, and if we're lucky, some of the party info.

That means that the party info is the only unique thing we've got, so if we're going to use that, being able to refine by volume date would really help reduce false positives.

brianwc commented 4 years ago

That style of citation is generally only used in slip opinions prior to the volume being published. However, if you know the year of the opinion in which you found such a citation, then I think we'd find that the years covered by the cited volume are that very year, maybe +/- 1 year. So, if I find such a citation in an opinion from 2016, then the volume of that citation likely covers opinions from 2016 as well, maybe 2015-17.

On Thu, Mar 26, 2020 at 11:41 AM Mike Lissner notifications@github.com wrote:

I forgot to mention why these are useful. In freelawproject/courtlistener#299 https://github.com/freelawproject/courtlistener/issues/299, we've identified that we want to start finding citations that lack page numbers, like, 442 U.S. ___. If we want to do that, we won't be able to rely on the citation to look them up and instead we'll have the volume number, reporter abbreviation, and if we're lucky, some of the party info.

That means that the party info is the only unique thing we've got, so if we're going to use that, being able to refine by volume date would really help reduce false positives.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/freelawproject/reporters-db/issues/19#issuecomment-604608684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACPKOKR5TZS7L5HZTVSSY3RJOOWPANCNFSM4LUOOL5Q .

mlissner commented 4 years ago

That's a really good point, Brian. There's no point in doing what this issue proposes, at least not for the purpose we were contemplating. Thanks.

mlissner commented 4 years ago

Note that in #21, @jcushman points out that non-numeric volume numbers are thing, so the above format would have some limitations.

jcushman commented 4 years ago

Couple more thoughts: