Cloud Storage? - Githubissues

scienceisfiction commented 8 years ago

I was wondering if anyone has any recommendations for cloud storage for large and varied data sets (raw data may be video, image, etc. files, not just txt or csv). Something that can handle large and weird objects but maybe also has some helpful features that work with R, Git, etc.?

Thanks! Melissa A

JoeyBernhardt commented 8 years ago

Hi @scienceisfiction, you might want to talk to @coreenforbes because she did a whole bunch of research on cloud backup services for our lab group meeting.

aammd commented 8 years ago

@BIOL548O/all ,

Great question @scienceisfiction ! Does anybody else want to explore this storing large datasets on the web? I know @JoeyBernhardt was telling me about having this challenge with her data, too. We could certainly study some of the solutions below as a class, if people are interested!

Here are some ideas:

Github itself can store any file (so that applies to the "varied" part). However, as @JoeyBernhardt was saying over in #15 , it won't work for really big files
Amazon Simple Storage Service (S3) looks like a good option. They'll give you 5GB for free, apparently, and i think more space is cheap. There are several packages for getting R to talk to S3 (ie in order to read and write things to your account on S3).
Check out the Ropensci web services taskview for lots more info. A lot of the good packages mentioned there are from the cloudyr project
If you decide to go with Amazon Web Services, and their S3 data storage, the best option seems to be the aws.s3 package. I can help you set that up if you want!
Dropbox is another option, if you have access to enough space. You can store things on Dropbox and use an R package called rdrop2 to access your data. You don't need to have that folder synchronized with your computer (ie you could just leave your stuff on the cloud and load it into R when you need it)

Does that help? Would anyone else like to add anything?

aammd commented 8 years ago

I have summoned the Science Nerds of Twitter and they have spoken: S3 is a popular choice, as is Dropbox. However I also learned that readr can read zipped csvs! @JoeyBernhardt try this out

library(readr)
library(dplyr)

mtcars_path <- tempfile(fileext = ".csv")
write_csv(mtcars, mtcars_path)

## this zips up the file
zipname <- paste0(mtcars_path, ".zip")
zip(zipname, mtcars_path) 

## then you can read it
mt_from_zip <- read_csv(zipname)

JoeyBernhardt commented 8 years ago

oh cool! thanks @aammd! I will try this :smile:

scienceisfiction commented 8 years ago

These are all really helpful--though I don't actually have time to pursue them now! After the class is over, will these discussions live on somewhere? I'd love to be able to circle back to this and some of the other tips and tricks that have come up in other discussions after term is over and I have a little more time to explore?

aammd commented 8 years ago

This Discussion thread will live forever

BIOL548O / Discussion

Cloud Storage? #16