ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.35k stars 134 forks source link

Add windows exe build. #46

Closed luckcolors closed 8 years ago

luckcolors commented 8 years ago

Hello. I would like to use grabsite on windows is it possible to add support for a windows exe build? The only windows missing library for windows 64 is https://github.com/PyYoshi/cChardet/issues/14 the other ones seems to all have a build or don't need any c exstension.

ivan commented 8 years ago

This is unfortunately not something I'm likely to do, just due to the headache of figuring out how to do the build and keeping an environment around to build it. And the command-line operation of grab-site isn't particularly friendly to most Windows users, anyway. In contrast to Windows, doing "releases" for Linux and OS X users is as easy as git push for me.

On Windows, the pip3 install ... command might work if you install VS2010 (either Express or full), Python 3.4.3, and git (select "Use git from the Windows command prompt" in the git installer). pip3 is in C:\Python34\Scripts\pip3. If the pip3 install ... command for grab-site succeeds; grab-site and gs-server will be installed in the C:\Python34\Scripts directory.

You might want to give grab-site a try in a Linux VM (I use VMWare Workstation, but VMWare Player and VirtualBox work too) or an ultra-cheap https://www.scaleway.com/ instance. Other inexpensive options include OVH VPS and digitalocean. Scaleway has the most local disk per dollar. If you immediately rsync over the WARCs elsewhere, that might not matter.

ivan commented 8 years ago

Never mind on the VS2010 Express; pip3 can't find it: https://www.google.com/search?q=vs2010+python+valueerror+path

ivan commented 8 years ago

Hm, if cChardet is the only problem, I can make a branch for you to try, give me a minute.

ivan commented 8 years ago

Some really rough instructions to get grab-site working on Windows: install Python 3.4.3 and git, then run this in a cmd shell (not cygwin):

set GRAB_SITE_NO_CCHARDET=1
C:\Python34\Scripts\pip3 install git+https://github.com/ludios/grab-site
C:\Python34\python C:\Python34\Scripts\gs-server

In another cmd shell:

C:\Python34\python C:\Python34\Scripts\grab-site URL

I'll try to make this work a little more smoothly soon.

luckcolors commented 8 years ago

Ok thanks it works :). Also you don't actually need to remove the whole cchardet library because there is the slower version made in pure python: https://github.com/chardet/chardet.

ivan commented 8 years ago

wpull's setup.py requires chardet, so grab-site doesn't need to. (grab-site doesn't actually use either chardet or cchardet; grab-site's cchardet require is just to make wpull faster.)

ivan commented 8 years ago

An .exe is probably not coming, but I have filed issues for the remaining Windows tasks: https://github.com/ludios/grab-site/issues

luckcolors commented 8 years ago

Thanks for you help. :)