MuckRock / muckrock

MuckRock's source code - Please report bugs, issues and feature requests to info@muckrock.com
https://www.muckrock.com
GNU Affero General Public License v3.0
114 stars 22 forks source link

Retain Folder Structure in Uploads #1828

Open amandabee opened 9 months ago

amandabee commented 9 months ago

We may need to address some of this at the Mail and Scans level. But currently, if an agency provides responsive materials in a drive or zip file, the material provided by the agency is stripped of any folder structures. So if they send us material like:

Staff Rosters
│   roster.csv
│
└───Prior Years
│   └───2023
│       │   roster.csv
│       │   ...
│   └───2022
│       │   roster.csv
│       │   ...

What ends up on the actual request will be:

│   roster.csv
│   roster.csv
│   roster.csv

Materials are often organized into a folder by case or incident, and even if the files have distinct names, the folder(s) those files sit in include relevant metadata. Currently, our process strips all of that.

amandabee commented 9 months ago

Allan did play with some potential visual treatments for this in a related exploration at: https://www.notion.so/muckrock/Allan-s-Thoughts-and-Ideas-afbe7c3a49b74e49acebcded8b55d0da

amandabee commented 7 months ago

Here are two examples of requests where the folder structure was nuked:

  1. https://www.muckrock.com/foi/tulare-3477/2023-sb1421sb16-request-tulare-police-department-139456/

This looks like it was a direct upload by the agency, but I'm still not good at telling for sure. Because the folders were nuked, there are a lot of files named "interview_transcript.pdf" that were initially in folders.

  1. https://www.muckrock.com/foi/richmond-3396/sb1421-records-2022-123053/

Sharepoint. It looks like there are a lot of video files that were provided that were stripped of folders structure. I can't access the original sharepoint directory so I don't know what the file structure looked like.

amandabee commented 7 months ago

Sharepoint:

Direct Uploads: