botpress / v12

Botpress OSS – v12
https://v12.botpress.com
GNU Affero General Public License v3.0
76 stars 88 forks source link

Training hangs and almost never reaches 100% (reached one or twice) #1726

Open boyoma opened 1 year ago

boyoma commented 1 year ago

Describe the bug My installation looks, fine (web embedding on another domain too) but every time I press train chabot percentage will raise slowly and eventually it will stop before reaching 100%, and I will need to press the button again. It got completed maybe once out of 100 times. More often than not it stuck at 0%.

The exact same bot was first produced in localhost and train there and it was working very fine. slow but it completes.

I'm using a 4 GB Memory / 80 GB Disk / Ubuntu 20.04 (LTS) x64

To Reproduce Steps to reproduce the behavior:

  1. Go to 'a bot'
  2. Click on 'train chatbot'
  3. See error 'almost never reach 100%'

Expected behavior Being slow it is ok but at least it should complete

Environment (please complete the following information):

cccaballero commented 1 year ago

Seems like I have the same problem, I installed on my local PC using the following docker-comopse.yml to test:

version: '3'

services:
  botpress:
    image: botpress/server
    expose:
      - 3000
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgres://postgres:secretpw@postgres:5435/botpress_db
    depends_on:
      - postgres
    volumes:
      - ./build/botpress/data:/botpress/data

  postgres:
    image: postgres:11.2-alpine
    expose:
      - 5435
    environment:
      PGPORT: 5435
      POSTGRES_DB: botpress_db
      POSTGRES_PASSWORD: secretpw
      POSTGRES_USER: postgres
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

This is all I can see in the logs:

botpress_1 | 04/17/2023 00:17:21.386 [NLU] training-queue [test01/87443a324c26b933.b7f4a95061d75566.3265.en] Training Queued.
botpress_1 | 04/17/2023 00:17:21.692 [NLU] Engine:training Training worker successfully started on process with pid 180.

I am using to test a new bot called test01 from the Small Talk template and without making changes.

I have not been able to complete any training, the maximum that I have been able to reach is 80%

sebburon commented 1 year ago

How much memory is accessible to your Botpress containers? Usually, when the training stops between 80 and 99% it's because the training process was killed by the OS because it was using too much memory.

Make sure your Botpress node has access to at least 3GB of ram.

Thanks,

cccaballero commented 1 year ago

@sebburon I don't think it's a memory problem, I don't have any limits defined for the docker container, and I have plenty of ram. This is what docker stats tells me:

MEM USAGE / LIMIT
573.6MiB / 38.88GiB