UC-MACSS / persp-research_Spr18

Course site for MACS 30200 (Spring 2018) - Perspectives on Computational Research
6 stars 33 forks source link

Patents data is too large for github #11

Open dgamarnik opened 6 years ago

dgamarnik commented 6 years ago

Github has a limit of 25 MB per file and the patents data is about 10x that much.

I cleaned it out a lot, saved it as a stata file and used a package to import that into R (so it's no longer a problem for me personally) but I'm not sure what other people without Stata can do.

bensoltoff commented 6 years ago

I think the actual hard limit for GitHub is 100mb. A few options.

  1. Use Git lfs (large file storage)
  2. Compress the data file as a ZIP file and commit that. Include instructions in your README.md for opening the data file for your analysis.
  3. Don't commit the data file, just the analysis and output.
yilundai commented 6 years ago

Wait I thought we are in the same group (for the food inspection data)?