Kevinlkc / legal_censorship

MIT License
0 stars 0 forks source link

[Build] Data refreshment: Correcting our "censored" sample #6

Open Kevinlkc opened 1 year ago

Kevinlkc commented 1 year ago

I manually took N=30 random sample from our "censored" dataset, and found only 2 that's actually missing from the website.

It could be that the government is putting them back in, but it might also be reasonable to believe that something wrong is going on there in the 2021 wave of scraping given the false-positive rate we're observing from our random sample.

I guess 2 suggested solutions:

Kevinlkc commented 1 year ago

With regard to the scale of censorship, I don't think we're missing anything by a magnitude if we trust the results given by Liebman et. al. (2022)

Screen Shot 2023-06-11 at 2 25 09 PM

Let's do some back-of-the-envelope calculations: For instance, we only observe 1,014,143 civil lawsuits trailed in 2013, when there are 1,021,098 posted of the same year when we visit the website in 2021; 7,550,158 trialed in 2016, when there are 7,628,756 posted of the same year when we visit the website in 2013, that would be about 1%, which is still economically significant. The only things that blocks our making progress is that we might be called the wrong files "censored".

One more item to do maybe is to look at the total number of civil lawsuits that we scraped from 2019 and 2021, and see if the magnitudes match with what the paper presents us, and the website's current count.