codeforsanjose / city-agenda-scraper

9 stars 16 forks source link

Eliminate duplicate staff report uploads #52

Open krammy19 opened 2 years ago

krammy19 commented 2 years ago

Right now our scraper is downloading all the relevant staff reports for its search parameters and uploading those documents to our shared Google Drive.

The problem is there is no verification if the file already exists on the Google Drive. If the scraper is run twice for the same date range, it will upload duplicates of all the staff reports. Google Drive does not prohibit duplicate files or file names.

Our GoogleDrive_upload needs to be updated to include some check to see if a particular filename already exists on the Google Drive in the specified city folder.

Tasks: