Needs improved documentation

elimbroc commented 2 months ago

Sorry, I don't know what I'm doing wrong but it's probably just something obvious I'm missing.

I'm on Mac, FWIW. I installed Python, pip, and all the dependencies I saw in book2text.py, but it kept creating 0 byte output documents without generating an error. The .epub does have a table of contents, I confirmed it in Calibre. What else could be going wrong? I then tried using another epub to csv converter and sum.py didn't do anything with the output, but it's probably not in the correct CSV format.

cognitivetech commented 2 months ago

look in the out directory.. first it creates a csv with just chapters on each line (filename.csv) then _processed.csv should be chunks...

so first check your out/filename.csv to see if text is extracted to begin with, then look at out/filename_processed.csv

if you make the csv manually, just be sure to follow the expected format

Title, Text, Length (length is ignored but it expects 3 columns)

of course you should have ollama installed and the models loaded... (this one is hardcoded for title, so be sure to have that or update the local code with your title model cognitivetech/obook_title:q3_k_m)

give me more details and I can help.

elimbroc commented 2 months ago

Okay, you're amazing and I appreciate your expertise and help! So, the problem was that python isn't default available on Mac, so I installed it with pyenv with these instructions. Now book2text.py runs correctly. But I'm getting this error when I run sum.py:

`Traceback (most recent call last): File "/Users/ericblom/Library/Mobile Documents/com~apple~CloudDocs/Downloads/eBooks/ollama-ebook-summary-main/sum.py", line 159, in process_file(input_file, model) File "/Users/ericblom/Library/Mobile Documents/com~apple~CloudDocs/Downloads/eBooks/ollama-ebook-summary-main/sum.py", line 103, in process_file output = response.json()["response"].strip()


KeyError: 'response'`

cognitivetech commented 2 months ago

I'm not sure... what is your python version? it should be 3.11.9... and what exactly is the command you are running for that step?

and you are installing requirements pip install -r requirements.txt?

weird though.. KeyError: 'response' seems like maybe something wrong with your ollama api

and really don't be sorry, this is still a little hacky, I need to improve the workflow, cause I think right now its sub-optimal... but I just wanted to get this code live after hitting a bit of a speed bump with the web-app.

elimbroc commented 2 months ago

Okay, I reverted Python3 to 3.11.9 and re-ran 'pip install -r requirements.txt' and it says there's no matching distribution foud for click compatible with Python 2. When I re-ran 'pip3 install -r requirements.txt' it fails to install the specified lxml version (I do have a more recent one installed). Then I ran 'python3 sum.py obook_summary willis_processed.csv' and it gives the same error. I installed ollama with the .pkg downloadable package on MacOS. Maybe it's the lxml issue? Or on your computer, is python an alias for python3?

cognitivetech commented 2 months ago

ok, I updated the code with improved error handling, so next time you pull changes and then run the script we will have improved output.

I never tried this on windows and have no idea if there is some slight difference there, or some reason your api uses a different port, who knows.

I also got these troubleshooting checks from Claude Sonnet 3.5. You can use these to verify your ollama installation.

Check Ollama's Default Port

By default, Ollama typically serves on port 11434. However, it's always good to verify this.

Verify Ollama is Running

Open Task Manager (Ctrl + Shift + Esc)
Go to the "Processes" tab
Look for "ollama.exe" in the list of running processes

Check Port Usage

To see which ports Ollama is using:

Open Command Prompt as Administrator
Run the following command:
```
netstat -ano | findstr :11434
```
This will show you if anything is listening on the default Ollama port.

Check Ollama's Configuration

Look for Ollama's configuration file. It's usually located in:
```
C:\Users\YourUsername\.ollama\config
```
Open this file with a text editor to check for any custom port settings.

Use PowerShell for Detailed Information

For more detailed information, you can use PowerShell:

Open PowerShell as Administrator

Run:

Get-NetTCPConnection | Where-Object { $_.State -eq 'Listen' } | Select-Object LocalAddress, LocalPort, OwningProcess | Sort-Object LocalPort

Look for entries related to Ollama's process ID

its possible this is still a python issue, but you got the same error when you upgraded to the specific version of python 3.11 (you did check version on the terminal yes? python3 --version)...

so if you update your code and try running again, we will have more verbose output to work on

elimbroc commented 2 months ago

Okay, here's what I'm getting, over and over:

Error making request to API: 404 Client Error: Not Found for url: http://localhost:11434/api/generate
Error generating title: 404 Client Error: Not Found for url: http://localhost:11434/api/generate

So surely it's a problem with my installation. I used the downloadable ollama app, and when in a browser I navigate to http://localhost:11434 I get a page that says ollama is running. But when I try to navigate to http://localhost:11434/api or http://localhost:11434/api/generate it says error 404, page not found. Not sure what I'm missing.

Any chance you're on Mac, too, by the way? Or at least Linux, hopefully? Claude Sonnet 3.5 gave Windows instructions and they didn't work but I'm not on Windows.

cognitivetech commented 2 months ago

oh, yes, I use mac and ubuntu!

maybe the issue is title generation...

you need to pull the model (and apologies, this is still not explicitly in the instructions, I was just going over this w someone else too)

ollama pull cognitivetech/obook_title:q3_k_m

or you can use the prompt from that model (found on this readme) with your favorite local llm, besides the obook_summary which is specialized for summary and not good at titles..

be sure to look in sum.py for location of where cognitivetech/obook_title:q3_k_m is marked...

...I suppose I should make a config file

cognitivetech commented 2 months ago

ok, actually talking with a friend, they are having this problem because ollama stores the model name like cognitivetech/obook_summary but then it tries to append that name to the output filename, which the / messes up..

(which I didn't realize because I just pushed these models to ollama, though at home mine is named mbn and didn't fully think through the implications.)

unfortunately I;m out of office, so I can try to push a fix for this, but you might be faster to just rename the model or adjust the output filename so it just uses the relevant part.

thanks for helping me to test this, sorry for the trouble!

elimbroc commented 2 months ago

Ahhhh yes that could definitely be the problem. I was wondering.

I can wait until you’re back in office, there’s no hurry for me. Thanks again for sharing this project with the world!

Regards, Eric Blom

On Sep 20, 2024, at 5:20 PM, CognitiveTech @.***> wrote:

ok, actually talking with a friend, they are having this problem because ollama stores the model name like cognitivetech/obook_summary but then it tries to append that name to the output filename, which the / messes up..

(which I didn't realize because I just pushed these models to ollama, though at home mine is named mbn and didn't fully think through the implications.)

unfortunately I;m out of office, so I can try to push a fix for this, but you might be faster to just rename the model or adjust the output filename so it just uses the relevant part.

thanks for helping me to test this, sorry for the trouble!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

cognitivetech commented 1 month ago

ok!!! that took longer than I anticipated, but ... I wanted to make it a lot less hacky and more like a real app.

now there is a config file and full instructions..

you won't have an issue with timeouts or that other nonsense, just pull the fresh code- and change the model name like in the instructions on the readme, then keep reading from there.. has 2 modes one for automated chunking csv and one for manual chunking text file.

I will add a setup file soon so we don't have to deal with that manual model name changing but its working good for now.

cognitivetech commented 1 month ago

I'm going to mark this as closed, feel free to comment if you have any further trouble.

cognitivetech / ollama-ebook-summary