MahinRahman8901 / c10-Court-Transcript

A data pipeline to automate the enhancement, discoverability and and analysis of real Courtroom documents.
2 stars 0 forks source link

Decide how many pages of cases we want to scrape #38

Open ErvinRex opened 2 months ago

ErvinRex commented 2 months ago

Task Description

User Stories

Relevant Files [If Available]

-

ayeshaa63 commented 2 months ago

I think we could start with the first 5 pages - that would give us 50 cases to work with as a starting point.

ErvinRex commented 2 months ago

Now that we know it does not take long to scrape and run the ETL pipeline, we could viably go through ~50 pages to get a large enough dataset to be used in our dashboard.

The daily run will only run on about 2/3 cases in the High Court for now, we can look at how looking at more courts can affect this time later.