Closed elimbroc closed 4 weeks ago
look in the out
directory.. first it creates a csv with just chapters on each line (filename.csv
) then _processed.csv should be chunks...
so first check your out/filename.csv
to see if text is extracted to begin with, then look at out/filename_processed.csv
if you make the csv manually, just be sure to follow the expected format
Title, Text, Length
(length is ignored but it expects 3 columns)
of course you should have ollama installed and the models loaded... (this one is hardcoded for title, so be sure to have that or update the local code with your title model cognitivetech/obook_title:q3_k_m
)
give me more details and I can help.
Okay, you're amazing and I appreciate your expertise and help! So, the problem was that python isn't default available on Mac, so I installed it with pyenv with these instructions. Now book2text.py runs correctly. But I'm getting this error when I run sum.py:
`Traceback (most recent call last):
File "/Users/ericblom/Library/Mobile Documents/com~apple~CloudDocs/Downloads/eBooks/ollama-ebook-summary-main/sum.py", line 159, in
KeyError: 'response'`
I'm not sure... what is your python version? it should be 3.11.9... and what exactly is the command you are running for that step?
and you are installing requirements pip install -r requirements.txt
?
weird though.. KeyError: 'response'
seems like maybe something wrong with your ollama api
and really don't be sorry, this is still a little hacky, I need to improve the workflow, cause I think right now its sub-optimal... but I just wanted to get this code live after hitting a bit of a speed bump with the web-app.
Okay, I reverted Python3 to 3.11.9 and re-ran 'pip install -r requirements.txt' and it says there's no matching distribution foud for click compatible with Python 2. When I re-ran 'pip3 install -r requirements.txt' it fails to install the specified lxml version (I do have a more recent one installed). Then I ran 'python3 sum.py obook_summary willis_processed.csv' and it gives the same error. I installed ollama with the .pkg downloadable package on MacOS. Maybe it's the lxml issue? Or on your computer, is python an alias for python3?
ok, I updated the code with improved error handling, so next time you pull changes and then run the script we will have improved output.
I never tried this on windows and have no idea if there is some slight difference there, or some reason your api uses a different port, who knows.
I also got these troubleshooting checks from Claude Sonnet 3.5. You can use these to verify your ollama installation.
By default, Ollama typically serves on port 11434. However, it's always good to verify this.
To see which ports Ollama is using:
Run the following command:
netstat -ano | findstr :11434
This will show you if anything is listening on the default Ollama port.
C:\Users\YourUsername\.ollama\config
For more detailed information, you can use PowerShell:
Open PowerShell as Administrator
Run:
Get-NetTCPConnection | Where-Object { $_.State -eq 'Listen' } | Select-Object LocalAddress, LocalPort, OwningProcess | Sort-Object LocalPort
Look for entries related to Ollama's process ID
its possible this is still a python issue, but you got the same error when you upgraded to the specific version of python 3.11 (you did check version on the terminal yes? python3 --version
)...
so if you update your code and try running again, we will have more verbose output to work on
Okay, here's what I'm getting, over and over:
Error making request to API: 404 Client Error: Not Found for url: http://localhost:11434/api/generate
Error generating title: 404 Client Error: Not Found for url: http://localhost:11434/api/generate
So surely it's a problem with my installation. I used the downloadable ollama app, and when in a browser I navigate to http://localhost:11434 I get a page that says ollama is running. But when I try to navigate to http://localhost:11434/api or http://localhost:11434/api/generate it says error 404, page not found. Not sure what I'm missing.
Any chance you're on Mac, too, by the way? Or at least Linux, hopefully? Claude Sonnet 3.5 gave Windows instructions and they didn't work but I'm not on Windows.
oh, yes, I use mac and ubuntu!
maybe the issue is title generation...
you need to pull the model (and apologies, this is still not explicitly in the instructions, I was just going over this w someone else too)
ollama pull cognitivetech/obook_title:q3_k_m
or you can use the prompt from that model (found on this readme) with your favorite local llm, besides the obook_summary which is specialized for summary and not good at titles..
be sure to look in sum.py for location of where cognitivetech/obook_title:q3_k_m
is marked...
...I suppose I should make a config file
ok, actually talking with a friend, they are having this problem because ollama stores the model name like cognitivetech/obook_summary
but then it tries to append that name to the output filename, which the /
messes up..
(which I didn't realize because I just pushed these models to ollama, though at home mine is named mbn
and didn't fully think through the implications.)
unfortunately I;m out of office, so I can try to push a fix for this, but you might be faster to just rename the model or adjust the output filename so it just uses the relevant part.
thanks for helping me to test this, sorry for the trouble!
Ahhhh yes that could definitely be the problem. I was wondering.
I can wait until you’re back in office, there’s no hurry for me. Thanks again for sharing this project with the world!
Regards, Eric Blom
On Sep 20, 2024, at 5:20 PM, CognitiveTech @.***> wrote:
ok, actually talking with a friend, they are having this problem because ollama stores the model name like cognitivetech/obook_summary but then it tries to append that name to the output filename, which the / messes up..
(which I didn't realize because I just pushed these models to ollama, though at home mine is named mbn and didn't fully think through the implications.)
unfortunately I;m out of office, so I can try to push a fix for this, but you might be faster to just rename the model or adjust the output filename so it just uses the relevant part.
thanks for helping me to test this, sorry for the trouble!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
ok!!! that took longer than I anticipated, but ... I wanted to make it a lot less hacky and more like a real app.
now there is a config file and full instructions..
you won't have an issue with timeouts or that other nonsense, just pull the fresh code- and change the model name like in the instructions on the readme, then keep reading from there.. has 2 modes one for automated chunking csv and one for manual chunking text file.
I will add a setup file soon so we don't have to deal with that manual model name changing but its working good for now.
I'm going to mark this as closed, feel free to comment if you have any further trouble.
Sorry, I don't know what I'm doing wrong but it's probably just something obvious I'm missing.
I'm on Mac, FWIW. I installed Python, pip, and all the dependencies I saw in book2text.py, but it kept creating 0 byte output documents without generating an error. The .epub does have a table of contents, I confirmed it in Calibre. What else could be going wrong? I then tried using another epub to csv converter and sum.py didn't do anything with the output, but it's probably not in the correct CSV format.