VertNet / webapp

VertNet web application
8 stars 7 forks source link

Zip large download files #129

Open laurarussell opened 11 years ago

laurarussell commented 11 years ago

Large download files (anything over 20-25 MB??) could be problematic for some users when they go to click the link the receive in the email.

Potential to include a READ ME file with the text download that includes Access and Citation information for the records.

robinkraft commented 11 years ago

If this becomes a priority, we'll need to do some research on how big the zip files could get. There are limits to how big standard zip archives can get, though I'm not sure what they are.

On Wed, May 29, 2013, at 03:23 PM, laurarussell wrote:

Large download files (anything over 20-25 MB??) could be problematic for some users when they go to click the link the receive in the email.

Potential to include a READ ME file with the text download that includes Access and Citation information for the records.

Reply to this email directly or [1]view it on GitHub. [RkOat9Y1_FdXL0zywX1whkkEWyEyWnIoZkkT4J7OWmppmpgwNaJF3HySSAFUdkZ7.gif]

References

  1. https://github.com/VertNet/webapp/issues/129
laurarussell commented 11 years ago

Perhaps we can discuss with Tim Robertson at GBIF. All their download links come by email when ready and the download is compressed...(I probably should have said compressed in the issue title :-)

robinkraft commented 11 years ago

Yeah I'm sure there's a way!

On May 29, 2013, at 3:52 PM, laurarussell notifications@github.com wrote:

Perhaps we can discuss with Tim Robertson at GBIF. All their download links come by email when ready and the download is compressed...(I probably should have said compressed in the issue title :-)

— Reply to this email directly or view it on GitHub.

eightysteele commented 11 years ago

lesse, so gzip and bzip don't have size limits. bzip is better compression but takes longer. we can't bzip arbitrarily large files on the app engine platform, so basically to support this in our arch we'll need to:

  1. use the cascading taskqueue to write the tsv results to google cloud storage
  2. push the completed tsv url to a pull queue
  3. ec2 micro instance consumes urls from pull queue, downloads, bzips, uploads bzip file back to google cloud storage, pings vertnet api with the bzip download url
  4. vertnet emails user with download link
robinkraft commented 11 years ago

Can standard decompression programs handle bzip? I never see bzip archives in the wild. How about gzip? I'm guessing the faster download of a bzip will be outweighed by the time to figure out what bzip files are and installing something that can handle them.

On Wed, May 29, 2013, at 05:21 PM, Aaron Steele wrote:

lesse, so gzip and bzip don't have size limits. bzip is better compression but takes longer. we can't bzip arbitrarily large files on the app engine platform, so basically to support this in our arch we'll need to:

  1. use the cascading taskqueue to write the tsv results to google cloud storage
  2. push the completed tsv url to a [1]pull queue
  3. ec2 micro instance consumes urls from pull queue, downloads, bzips, uploads bzip file back to google cloud storage, pings vertnet api with the bzip download url
  4. vertnet emails user with download link

Reply to this email directly or [2]view it on GitHub. [RkOat9Y1_FdXL0zywX1whkkEWyEyWnIoZkkT4J7OWmppmpgwNaJF3HySSAFUdkZ7.gif]

References

  1. https://developers.google.com/appengine/docs/python/taskqueue/overview-pull
  2. https://github.com/VertNet/webapp/issues/129#issuecomment-18654581
eightysteele commented 11 years ago

Yeah gzip probably.

robgur commented 11 years ago

+1 gzip.

On Fri, May 31, 2013 at 10:05 AM, Aaron Steele notifications@github.comwrote:

Yeah gzip probably.

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/webapp/issues/129#issuecomment-18754476 .

mkoo commented 11 years ago

gzip for the win!

On Fri, May 31, 2013 at 9:09 AM, Rob notifications@github.com wrote:

+1 gzip.

On Fri, May 31, 2013 at 10:05 AM, Aaron Steele notifications@github.comwrote:

Yeah gzip probably.

— Reply to this email directly or view it on GitHub< https://github.com/VertNet/webapp/issues/129#issuecomment-18754476> .

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/webapp/issues/129#issuecomment-18754719 .

tucotuco commented 9 years ago

During the purge of Google Files API dependency, noted (https://github.com/VertNet/webapp/blob/master/vertnet/service/download.py#L275) that we want to zip the results. The Google Cloud Storage Client Library does not support this. Another way to do it is to use gsutil on Google Compute Engine.