Patrick-Hogan / wandering_inn

Download and convert The Wandering Inn to epub and mobi (kindle) format
27 stars 16 forks source link

error importing BeautifulSoup from bs4 #11

Closed SingleLight closed 1 year ago

SingleLight commented 1 year ago

Hi, I am new to using python scripts. I followed the installation instructions, but I always get this error when running the script: from bs4 import BeautifulSoup ModuleNotFoundError: No module named 'bs4' I tried both using venv and installing deps globally, also tried using codespace, they all yield the same results. Any help will be appreciated.

Patrick-Hogan commented 1 year ago

Sounds like either:

1) beautiful soup didn't install; try re-running pip install -r requirement.txt and read the output carefully, or try installing just beautifulsoup4==4.7.1

2) the python interpreter invoked when you run the script doesn't have access to beautiful soup

Frankly, it was probably 2--and that should be fixed in master now if you pull. For some reason, I had used #!/usr/bin/python3, which would break direct invocation when in venv, etc. Running the script by e.g. python wanderinginn2epub.py should work in that case, though (note: python 3 is assumed; if python --version shows python2, you'll need to change the commands to python3/pip3 rather than python/pip).

If current master version (after git pull) still isn't working for you, please post the output of (run in bash/sh/etc):

pip --version
python --version
pip list installed

The last can be a subset (e.g., pipe to grep -i soup). They should be run as you want to run the script and installed requirements (e.g., in your activated venv).

That should let me see enough to diagnose.

SingleLight commented 1 year ago

Thank you Patrick. That's very helpful, I pulled the latest and it worked. But during the build volume by volume, error occured during volume 6 build here are the error logs:

Traceback (most recent call last): 
  File "/Users/name/code/wandering_inn/./wanderinginn2epub.py", line 342, in <module> 
    main() 
  File "/Users/name/code/wandering_inn/./wanderinginn2epub.py", line 329, in main 
    gen = OPFGenerator(volume_data)
  File "/Users/name/code/wandering_inn/ebookmaker/ebookmaker.py", line 105, in __init__
    Generator.__init__(self, ebookData)
  File "/Users/name/code/wandering_inn/ebookmaker/ebookmaker.py", line 54, in __init__
    self.outline[fname] = self.outlineEBookContents(fname, depth)
  File "/Users/name/code/wandering_inn/ebookmaker/ebookmaker.py", line 67, in outlineEBookContents
    outline = [h for h in soup.body if getattr(h, 'name', None) in hTags and int(h.name[-1]) <= depth]
TypeError: 'NoneType' object is not iterable
Patrick-Hogan commented 1 year ago

Hm. I was able to build without a problem, both with no arguments (generates a single epub with all released chapters), by volume and by chapter without any problems.

Is it possible one or more html files wasn't downloaded correctly (e.g., due to network interruption)? The html files for chapter 6 should all be pretty sizable (60K-180K); any zero-byte or missing files might cause this kind of a problem, I think. Easiest fix may just be to run rm build/html/wandering_inn-06*.html (removing all of the volume 6 files) and re-run the build.

For a more targeted approach that might give some insight, I generated a checksum of the html downloads from my build here: checksum_chapters.sha256.txt; you can use it to find any missing/corrupt html files in your build directory by:

SingleLight commented 1 year ago

Wow yeah it was just a corrupted html file, running the remove command solved it and I was able to build all the volumes. Thank you Patrick

Patrick-Hogan commented 1 year ago

No problem---glad it's working for you