NationalGalleryOfArt / opendata

The National Gallery of Art Open Data Program
https://www.nga.gov/open-access-images/open-data.html
Creative Commons Zero v1.0 Universal
365 stars 65 forks source link

Why zip files? #1

Closed dylan-k closed 3 years ago

dylan-k commented 3 years ago

Wouldn't it be easier for git to work work with this information in its native format? If the csv files were committed to this repository without being hidden by binary data, it would be much easier to track changes, for example. Also, git wouldn't have to work as hard, as it is not perfectly suited to working with zip files.

beaudet commented 3 years ago

Thanks for the feedback. I totally agree with you. However, when we first started planning to release our collection data, the size of some of the uncompressed files was greater than Github's 100 MB maximum which meant we would have to use GitHub's large file storage mechanism. We subsequently eliminated a number of large text fields from the release while we review copyright implications and that has pushed the size closer to 50MB for the largest uncompressed files, so we could eliminate the zip step until we encounter the problem again down the road. Zipping the files seemed like a reasonable alternative at the time. We're happy to adapt the process to whatever people feel would maximize the use of our data.

beaudet commented 3 years ago

Dylan, we dropped the zip format for now. Let us know how the CSV works for you. Hopefully we won't run up against the 100MB limitation for a while.