File ingestion gets stuck for a long time

Describe the bug

I'm using the following config:

{
    "app": {
        "max_file_size_in_mb": 100
    },
    "embedding": {
        "provider": "ollama",
        "base_model": "nomic-embed-text",
        "base_dimension": 768,
        "batch_size": 32
    },
    "completions": {
        "provider": "litellm",
        "model": "ollama/dolphin-llama3:8b-v2.9-q6_K"
    },
    "ingestion":{
        "excluded_parsers": [
            "gif", "jpeg", "jpg", "png", "svg", "mp3", "mp4"
        ]
    },
    "vector_database": {
        "provider": "pgvector",
        "user": "r2r",
        "password": "r2r",
        "host": "127.0.0.1",
        "db_name": "r2r",
        "port": 5432,
        "vecs_collection": "r2rnomic"
    }
}

When I ran r2r ingest-files on the EC2 documentation, the app got stuck for a long time, but without doing any work (all CPUs idle, no ollama requests visible in the logs). After over 2 minutes of waiting, it processed the file in ~30 sec. (I saw a lot of ollama embedding requests coming through).

Using commit 2f6f18c66858b4cf15d29accd19d7ef8016e98d4

To Reproduce

r2r --config-path ... ingest-files ec2-ug.pdf

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: macos

SciPhi-AI / R2R

File ingestion gets stuck for a long time #620