Open flooie opened 1 year ago
I'm sorry, I really don't understand what you're trying to do, or what you did, nor if there's a problem. What's 64? What's 23165? Are you doing an update of items from one value to another?
I don't know what 64 is - It looks like its from an old import. Where anything went?
Sorry, I was struggling to make sense of this yesterday, but it seems obvious to me now. My bad. So there are 23k items with the source of 64
. Can you share a few?
4808157: O'Shea v. Commissioner
4808163: Leger v. Commissioner
Looks obviously now as using the docket sources in a cluster source field.
I think so, yep.
I assume we shouldn't use Q for the Anonymous dataset
ANON_2020 = "I"
(ANON_2020, "2020 anonymous database"),
No, probably not. What source do the dockets on these show?
ANON_2020 = 64
(ANON_2020, "2020 anonymous database"),
All the letters that would make sense are taken.
What source do the dockets on these show?
... ANON_2020 = 64 (ANON_2020, "2020 anonymous database"),
I was proposing the new opinion cluster source - name and letter above.
Oh, got it, the dockets have the same thing as the clusters, 64
. Yeah, Q seems like the right letter for it. Funny and memorable. Why not.
@mlissner - well... I think it's the right letter too, but it was also a joke because Q Anon
is a conspiracy theory.
That wasn't lost on me. :D
Just wanted to make sure -
This is fixed.
@mlissner @flooie while I was importing some caselaw records, I found that there are still 23,161 records with source 64
Fortunately, I found a PR https://github.com/freelawproject/courtlistener/pull/2727 that addresses this issue by adding a new source. To clean these records, we can simply run the following code:
from cl.search.models import OpinionCluster, SOURCES
OpinionCluster.objects.filter(source='64').update(source=SOURCES.ANON_2020)
we can simply run the following code:
I take this back.
I noticed during my review of PR #2727 that there are two sources two sources for anon import
ANON_2020 = "Q"
ANON_2020_M_HARVARD = "QU"
Since ANON_2020
appears more generic, @flooie would it be safe to assume it's the appropriate replacement for the current value "64"?
I'll answer for Bill. Yes. If the value is 64, we can replace it with Q
.
Just ran the code:
In [15]: from cl.search.models import OpinionCluster, SOURCES
In [16]: OpinionCluster.objects.filter(source='64').update(source=SOURCES.ANON_2020)
Out[16]: 23161
In [17]: OpinionCluster.objects.filter(source='64').count()
Out[17]: 0
@flooie or @quevon24, would this thing lingering around for the past year impact any of our importers?
@flooie, can you provide an update here, please?
@flooie can you provide an update here, please?
In preparing a model change in a small PR - I went to update only the required sources in the Opinion Cluster Source Options.
To do that I pulled all the distinct sources for to be merged opinions. I ran the following code to extract out the files I needed to updated and then queried the distinct values.
They all kind of make sense - except ...
64
of which there are 23,165. The first was imported in 2014 and the source on the admin page says merged from resource.org. Not sure what is going on here.