bibanon / BASC-Archiver

Python-based Imageboard (4chan) complete thread archiver.
https://pypi.python.org/pypi/BASC-Archiver/
135 stars 18 forks source link

Make a Windows Executable #7

Closed antonizoon closed 9 years ago

antonizoon commented 9 years ago

Using Pip on Windows is an exercise in frustration, unlike the quick and easy method on Mac OS X and Linux. An EXE build is critical. Have a .exe for both command line and GUI versions.

Daniel Oaks is currently on the case, though I'm going to try and make a quick and dirty executable for myself for the moment.

antonizoon commented 9 years ago

I'm experimenting with PyInstaller, and it seems to be giving good results. Especially since it builds for both Windows and Mac OS X. I can't use py2exe on Linux, so scratch that.

antonizoon commented 9 years ago

Damn it, I simply don't have the skills to create a working .exe file. Daniel Oaks, you'll have to take over...

Here's what I have, but it doesn't seem to work with 64-bit applications.

DanielOaks commented 9 years ago

Sounds good, I'll try going ahead with something similar to what I'm doing with the GUI executable.

DanielOaks commented 9 years ago

@antonizoon Can you give this a shot, see how it goes for you? It's the CLI version.

Also sorry 'bout the dodgy filehost, haven't had a chance to put Dropbox and such back on my machine.

antonizoon commented 9 years ago

Thanks, I'll try it out soon and tell you what happens. On Jan 26, 2015 4:41 AM, "Daniel Oaks" notifications@github.com wrote:

@antonizoon https://github.com/antonizoon Can you give this http://wikisend.com/download/887438/basc-archiver.zip a shot, see how it goes for you?

Also sorry 'bout the dodgy filehost, haven't had a chance to put Dropbox and such back on my machine.

— Reply to this email directly or view it on GitHub https://github.com/bibanon/BASC-Archiver/issues/7#issuecomment-71435012.

antonizoon commented 9 years ago

Tested with wine, works perfectly. Now I'll grab my Windows virtual machine...

On Mon, Jan 26, 2015 at 10:33 AM, Lawrence Wu sagnessagiel@gmail.com wrote:

Thanks, I'll try it out soon and tell you what happens. On Jan 26, 2015 4:41 AM, "Daniel Oaks" notifications@github.com wrote:

@antonizoon https://github.com/antonizoon Can you give this http://wikisend.com/download/887438/basc-archiver.zip a shot, see how it goes for you?

Also sorry 'bout the dodgy filehost, haven't had a chance to put Dropbox and such back on my machine.

— Reply to this email directly or view it on GitHub https://github.com/bibanon/BASC-Archiver/issues/7#issuecomment-71435012 .

antonizoon commented 9 years ago

Ok, the image/thumbnail downloading works perfectly on Windows.

However, Unicode errors are coming back up again in the function _download_thread() (when working with non-latin/Japanese characters). I fixed this before in the HTML Download function.

For some reason in Windows, Python is converting Unicode to some other text encoding in some function. But everything should be unicode.

(my computer has Traditional Chinese encoding enabled, but the same thing will happen with american ASCII, just that cp950 is replaced with ascii in the errors)

C:\Downloads\basc-archiver>thread-archiver.exe https://boards.4chan.org/r9k/thread/16009282/hello-r9k-im-a-girl-looking-for-a-bf-who-woulda
4chan Thread: /r9k/16009282
  Image: 1422290014636.jpg downloaded.
  Thumbnail: 1422290014636s.jpg downloaded.
Traceback (most recent call last):
  File "C:\Python34\lib\site-packages\cx_Freeze\initscripts\Console.py", line 27
, in <module>
  File "thread-archiver", line 63, in <module>
  File "C:\Users\Dan\BASC-Archiver\basc_archiver\__init__.py", line 58, in downl
oad_threads
  File "C:\Users\Dan\BASC-Archiver\basc_archiver\sites\base.py", line 32, in dow
nload_threads
  File "C:\Users\Dan\BASC-Archiver\basc_archiver\sites\fourchan.py", line 222, i
n _download_thread
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe2 in position 11756: ille
gal multibyte sequence
C:\Downloads\basc-archiver>thread-archiver.exe https://boards.4chan.org/jp/thread/12956571/sukusuku-hakutaku

4chan Thread: /jp/12956571
  Image: 1422266957244.jpg downloaded.
  Image: 1422268821495.jpg downloaded.
  Image: 1422269289790.png downloaded.
  Image: 1422285092251.jpg downloaded.
  Image: 1422287284412.jpg downloaded.
  Image: 1422289699616.png downloaded.
  Thumbnail: 1422266957244s.jpg downloaded.
  Thumbnail: 1422268821495s.jpg downloaded.
  Thumbnail: 1422269289790s.jpg downloaded.
  Thumbnail: 1422285092251s.jpg downloaded.
  Thumbnail: 1422287284412s.jpg downloaded.
  Thumbnail: 1422289699616s.jpg downloaded.
Traceback (most recent call last):
  File "C:\Python34\lib\site-packages\cx_Freeze\initscripts\Console.py", line 27
, in <module>
  File "thread-archiver", line 63, in <module>
  File "C:\Users\Dan\BASC-Archiver\basc_archiver\__init__.py", line 58, in downl
oad_threads
  File "C:\Users\Dan\BASC-Archiver\basc_archiver\sites\base.py", line 32, in dow
nload_threads
  File "C:\Users\Dan\BASC-Archiver\basc_archiver\sites\fourchan.py", line 222, i
n _download_thread
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe2 in position 11607: ille
gal multibyte sequence
antonizoon commented 9 years ago

Ok, I found the lines that causes this error. When opening files, they must be decoded as Unicode, not the default system format. I think.

In fourchan.py, line 222:

found_css_files = css_regex.findall(open(local_filename).read())

In fourchan.py, line 233:

found_js_files = js_regex.findall(open(local_filename).read())

However, there might be more lines like this, but we don't know until we retest.

antonizoon commented 9 years ago

However, I did the above tests using Windows XP with non-unicode locale (a use case we should watch out for, it's not a surprise to see anons with it).

On the other hand, the exe file you created works fine on my Windows 8.1 system (with English Unicode locale).

I'll still try to find a way to import and export text as Unicode, just for compatibility's sake (and good coding)

antonizoon commented 9 years ago

Unfortunately, it looks like the Windows XP encoding problems lie deeper than we can actually fix, since even Python's own fileinput.py library has issues:

  File "C:\Users\Dan\BASC-Archiver\basc_archiver\utils.py", line 56, in file_replace
  File "C:\Python34\lib\fileinput.py", line 263, in __next__
  File "C:\Python34\lib\fileinput.py", line 363, in readline
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe2 in position 3551: illegal multibyte sequence

I guess we just have to make Unicode encoding a prerequisite for running this script. I'll try enabling it in Windows XP.

DanielOaks commented 9 years ago

Aha nah, that's just me calling the wrong function for the job. Once we find all the spots that assume UTF-8 and force them to use unicode, it should work fine.

DanielOaks commented 9 years ago

This is closed with the v0.8.5 release. The next step is implementing the GUI ( #5 ), which depends on #6, both of which I'm working on now.