PDX-Capstone-Team-C / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
http://scrapy.org
BSD 3-Clause "New" or "Revised" License
0 stars 4 forks source link

Investigate: Git-Annex and Git-LFS #8

Closed mjsiegfried closed 8 years ago

mjsiegfried commented 8 years ago

Do some research to determine which delta compression library would be best for our first implementation. Take some notes so the decision can be made after a team meeting / online discussion.

Key points to look for: how does it use deduping/delta encoding? Do python libraries/bindings exist for it? Can archives be accessed easily outside of scrapy? Is it available as an easily-installable Linux package?

Also, can you come up with a rough sketch of what an implementation would look like? (i.e., we import the library and simply make this function call, or we would need to write a wrapper for this,etc)