SouthTech hosting grabs both excel & pdf files. Auto creates directory for collecting files then creates additional directories to group by candidate/platform (includes type of election, data, and title). Expect for scraping to take roughly 2-3 hours (may be more).
PDFs and excels are currently not labeled. Can have information scraped eventually, but need to rename files after download is complete. Figure out way to address duration of scraper (improve speed). Also, would like to make the scraper more adaptable for different south host tech sites (left comments in jupyter notebook for detailed suggestions).
SouthTech hosting grabs both excel & pdf files. Auto creates directory for collecting files then creates additional directories to group by candidate/platform (includes type of election, data, and title). Expect for scraping to take roughly 2-3 hours (may be more).
PDFs and excels are currently not labeled. Can have information scraped eventually, but need to rename files after download is complete. Figure out way to address duration of scraper (improve speed). Also, would like to make the scraper more adaptable for different south host tech sites (left comments in jupyter notebook for detailed suggestions).