BoredLabsHQ / Concord

Concord is an open-source AI plugin designed to connect community members to relevant conversations
GNU General Public License v3.0
2 stars 3 forks source link

[Back-End] Configure preprocess, BerTopic initialisation and verbosity #20

Open Septimus4 opened 2 weeks ago

Septimus4 commented 2 weeks ago

For a configurable BerTopic plugin, consider these steps for flexibility and ease of setup:

1. Configuration File

Using a YAML configuration file is an effective way to manage pre-processing, model setup, and logging options. This allows for easy tweaking without modifying the code. Here's a sample structure in YAML:

preprocessing:
  stopwords: true
  punctuation: true
  lowercase: true

bertopic:
  min_topic_size: 10
  embedding_model: "all-MiniLM-L6-v2"
  ...
  ...

logging:
  verbosity: "info"  # Options: debug, info, warning, error

You can read this configuration file in your Python script with a library like PyYAML for YAML.

2. Script Argument Overrides

Allow command-line arguments to override config values. Using argparse, you can parse specific options, e.g., logging verbosity, pre-processing steps, or model settings. These arguments can dynamically override the config file settings if provided.

Example argument setup:

import argparse
import yaml

def parse_arguments():
    parser = argparse.ArgumentParser()
    parser.add_argument("--stopwords", action="store_true", help="Enable stopword removal")
    parser.add_argument("--punctuation", action="store_true", help="Enable punctuation removal")
    parser.add_argument("--verbosity", choices=["debug", "info", "warning", "error"], help="Set logging verbosity")
    return parser.parse_args()

args = parse_arguments()

4. Configurable Logging

For logging, use Python’s logging module. Map verbosity levels from the config or arguments.

import logging

logging.basicConfig(level=getattr(logging, verbosity.upper()))
logger = logging.getLogger(__name__)

logger.info("BerTopic model initialized.")
logger.debug("Debugging details here if verbosity is set to debug.")