de3sw2aq1 / wattpad-ebook-scraper

UNMAINTAINED, use https://github.com/JimmXinu/FanFicFare instead
MIT License
24 stars 8 forks source link

error when title contains colon (:) #5

Closed PescheHelfer closed 8 years ago

PescheHelfer commented 9 years ago

If the title of the story contains a colon, an error will result.

Example:

python scrape.py http://www.wattpad.com/story/24305532-the-kabul-incident-a-weir-codex-novella

Title: The Kabul Incident: a Weir Codex Novella

--> Error: File "F:\My Documents\Win7\GitHub\wattpad-ebook-scraper\epub.py", line 309, in __writeMimeType fout = open(os.path.join(self.rootDir, 'mimetype'), 'wt') OSError: [Errno 22] Invalid argument: './The Kabul Incident: a Weir Codex Novell a\mimetype'

PescheHelfer commented 9 years ago

That one is quite easy to fix. In scrape.py at line 94, replace

book.make('./{title}'.format(title=book.title))

by

folder_name = book.title.replace(':',',')
book.make('./{outputDir}'.format(outputDir=folder_name))

This should probably be expanded to more characters that are not allowed in filenames (and tend to appear in titles).

de3sw2aq1 commented 9 years ago

If it's not obvious, I never did a lot of testing on windows. Windows has a lot of forbidden characters.

I can fix this. It needs to actually get the list of special characters for the current platform.

de3sw2aq1 commented 8 years ago

This should be fixed now with 09b17c352e6249683097032451afb7fb19eb2113.

I included all illegal characters listed in some Microsoft documentation and added the period character because on Linux or OS X a leading period makes files invisible.

For now, all illegal characters are replaced with the hyphen character.