RichardHightower / notion_extractor

Extracts notions zip file of markdown into normal markdown files.
0 stars 0 forks source link

Use a tool to turn this into webpages #4

Open RichardHightower opened 4 days ago

RichardHightower commented 4 days ago

Use a tool to turn the output markdown files into webpages. In the past I used https://gohugo.io/.


Hugo Site Generator Specification

Overview

A simple tool that takes markdown files from output/flat and creates a Hugo-ready website under websites/EXPORT_NAME. The tool focuses on proper Hugo organization and configuration rather than markdown conversion.

Directory Structure

website-generator/
├── templates/                    # Hugo site templates
│   ├── config.toml              # Base Hugo configuration
│   └── archetypes/              # Default front matter templates
├── websites/
│   └── EXPORT_NAME/             # Each export gets its own Hugo site
│       ├── config.toml          # Site-specific config
│       ├── content/             # Markdown files go here
│       │   └── _index.md        # Auto-generated home page
│       ├── static/              # Images and other assets
│       └── themes/              # Hugo theme (e.g., PaperMod)
└── main.py                      # Site generator script

Program Flow

1. Site Creation

def create_hugo_site(export_name: str):
    """Creates a new Hugo site"""
    site_path = f"websites/{export_name}"
    # Run hugo new site
    subprocess.run(["hugo", "new", "site", site_path])
    # Copy base config
    shutil.copy("templates/config.toml", f"{site_path}/config.toml")
    # Install theme
    install_theme(site_path)

2. Content Organization

def organize_content(flat_dir: str, hugo_dir: str):
    """Organizes flat markdown files into Hugo content structure"""
    # Copy markdown files to content/
    # Update front matter if needed
    # Generate _index.md files for sections

3. Configuration

# Base config.toml template
baseURL = "/"
languageCode = "en-us"
title = "{EXPORT_NAME}"
theme = "PaperMod"  # Or another simple theme

[params]
  description = "Documentation"
  ShowBreadCrumbs = true
  ShowPostNavLinks = true
  ShowCodeCopyButtons = true

[menu]
  [[menu.main]]
    name = "Home"
    url = "/"
    weight = 1

Core Features

1. Content Preparation

2. Front Matter Addition

---
title: "Page Title"  # Extracted from first H1 or filename
weight: 10          # Based on filename or sequence
description: ""     # First paragraph or empty
---

3. Theme Setup

Command Line Interface

# Basic usage
python hugo_site_gen.py create --name "Project Docs" --input output/flat

# Commands
  create        Create new Hugo site
  update        Update existing site with new content

# Options
  --name        Export name (required)
  --input       Input directory (default: output/flat)
  --theme       Hugo theme (default: PaperMod)

Dependencies

Code Structure

class HugoSiteGenerator:
    def __init__(self, export_name: str, input_dir: str):
        self.export_name = export_name
        self.input_dir = input_dir
        self.site_path = f"websites/{export_name}"

    def generate(self):
        """Main generation process"""
        self.create_site()
        self.organize_content()
        self.configure_site()

    def create_site(self):
        """Creates new Hugo site"""

    def organize_content(self):
        """Organizes content files"""

    def configure_site(self):
        """Updates Hugo configuration"""

Usage Example

# Starting with files in output/flat:
# - Event_Bridge_API.md
# - Event_Bridge_Setup.md
# - Lambda_Functions.md

python hugo_site_gen.py create --name "AWS Documentation"

# Creates Hugo site at websites/AWS Documentation/
# With content structure:
content/
├── _index.md
├── event-bridge/
│   ├── _index.md      # Auto-generated section page
│   ├── api.md
│   └── setup.md
└── lambda-functions/
    └── _index.md

Key Differences from Previous Approach

  1. No markdown-to-HTML conversion (Hugo handles this)
  2. Simpler directory structure (Hugo standard)
  3. Leverages Hugo's built-in features
  4. Focus on content organization and basic configuration
  5. Minimal dependencies

Future Enhancements

  1. Multiple theme support
  2. Custom shortcode generation
  3. Taxonomy generation from content
  4. Multi-language support
  5. PDF export configuration

The main advantage of this approach is its simplicity - it focuses on organizing content for Hugo rather than trying to handle conversion and templating ourselves. Hugo's built-in features handle most of the complex work.

Would you like me to create example code for any specific component of this system?

RichardHightower commented 4 days ago

In the past I used https://gohugo.io/

RichardHightower commented 3 days ago

Hugo Generator Feature Specifications

1. Multiple Theme Support

Overview

Allow users to switch between pre-configured Hugo themes while maintaining consistent content structure.

Configuration

# themes_config.yaml
themes:
  docsy:
    name: "docsy"
    repo: "https://github.com/google/docsy.git"
    config_template: "docsy_config.toml"
    dependencies:
      - postcss-cli
      - autoprefixer

  papermod:
    name: "PaperMod"
    repo: "https://github.com/adityatelange/hugo-PaperMod.git"
    config_template: "papermod_config.toml"

CLI Commands

# List available themes
hugo-site theme list

# Switch themes
hugo-site theme switch --name docsy

# Install new theme
hugo-site theme add --name mytheme --repo URL

Directory Structure

website-generator/
├── themes/
│   ├── configs/           # Theme-specific config templates
│   │   ├── docsy.toml
│   │   └── papermod.toml
│   └── patches/          # Theme-specific fixes/customizations
└── scripts/
    └── theme_manager.py

2. Custom Shortcode Generation

Overview

Tool to create and manage Hugo shortcodes for common documentation patterns.

Shortcode Types

# shortcodes_config.yaml
shortcodes:
  note:
    template: "note.html"
    params:
      - type: [info, warning, danger]
      - title

  api:
    template: "api.html"
    params:
      - method: [GET, POST, PUT, DELETE]
      - endpoint
      - response_type

Usage Example

{{< note type="warning" title="Important" >}}
This is a warning message
{{< /note >}}

{{< api method="GET" endpoint="/users" response_type="json" >}}

Directory Structure

shortcodes/
├── templates/          # Shortcode HTML templates
├── examples/          # Usage examples
└── generator.py       # Shortcode generation script

3. Taxonomy Generation

Overview

Automatically generate Hugo taxonomies from content analysis.

Configuration

# taxonomy_config.yaml
analyzers:
  tech_stack:
    patterns:
      - "uses (.*?) for"
      - "built with (.*?)"

  status:
    patterns:
      - "Status: (.*?)"
    allowed_values:
      - "Draft"
      - "Review"
      - "Published"

Generated Output

# Generated taxonomy config
[taxonomies]
  tech_stack = "tech_stacks"
  status = "statuses"
  category = "categories"

Directory Structure

taxonomy/
├── patterns/         # Taxonomy detection patterns
├── blacklist/       # Terms to ignore
└── analyzer.py      # Content analysis script

4. Multi-language Support

Overview

Support for multiple language versions of documentation with automated translation workflows.

Configuration

# i18n_config.yaml
languages:
  es:
    name: "Spanish"
    weight: 1
    translator: "deepl"  # Translation service to use

  fr:
    name: "French"
    weight: 2
    translator: "google"

Directory Structure

i18n/
├── languages/              # Language-specific configurations
├── translations/          # Translation memory/cache
├── glossary/             # Technical term translations
└── translator.py         # Translation management script

CLI Commands

# Add new language
hugo-site lang add --code es

# Update translations
hugo-site lang sync --code es

# Generate language switcher
hugo-site lang menu

5. PDF Export Configuration

Overview

Generate PDF versions of documentation with customizable layouts and styling.

Configuration

# pdf_config.yaml
pdf:
  layouts:
    default:
      page_size: "A4"
      margins: "2cm"
      fonts:
        heading: "Roboto"
        body: "OpenSans"

    print:
      page_size: "Letter"
      margins: "1inch"
      include_toc: true

  metadata:
    author: "Documentation Team"
    subject: "Technical Documentation"
    keywords: "docs, technical, api"

Directory Structure

pdf/
├── templates/           # PDF layout templates
├── styles/             # PDF-specific CSS
├── assets/            # PDF-specific images/logos
└── generator.py       # PDF generation script

CLI Commands

# Generate PDF for all content
hugo-site pdf generate

# Generate PDF for specific section
hugo-site pdf generate --section api

# Use specific layout
hugo-site pdf generate --layout print

Common Implementation Patterns

1. Configuration Management

class FeatureConfig:
    def __init__(self, config_path: Path):
        self.config = self._load_yaml(config_path)
        self.validate_config()

    def validate_config(self):
        """Validate feature-specific configuration"""

2. CLI Integration

def add_feature_commands(subparsers):
    """Add feature-specific commands to CLI"""
    feature_parser = subparsers.add_parser('feature_name')
    feature_parser.add_argument('--option', help='Feature option')

3. Error Handling

class FeatureError(Exception):
    """Base class for feature-specific errors"""
    pass

def handle_feature_error(func):
    """Decorator for feature error handling"""
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except FeatureError as e:
            logging.error(f"Feature error: {e}")
            raise
    return wrapper

4. Logging

def setup_feature_logging(feature_name: str):
    """Setup feature-specific logging"""
    logger = logging.getLogger(feature_name)
    handler = logging.FileHandler(f"{feature_name}.log")
    logger.addHandler(handler)

Would you like me to elaborate on any of these features or create a detailed implementation plan for any specific one?

RichardHightower commented 3 days ago

Hugo PDF Generator Specification

Overview

A tool that takes Hugo's generated HTML output and creates PDFs using WeasyPrint or alternative PDF engines. The tool will maintain Hugo's styling and structure while creating print-ready documentation.

Design Choices

Directory Structure

pdf-generator/
├── config/
│   ├── pdf_config.yaml          # PDF generation settings
│   └── print_styles.css         # Print-specific CSS
├── templates/
│   ├── cover.html              # PDF cover page template
│   └── footer.html             # PDF footer template
├── output/
│   └── pdfs/                   # Generated PDFs
└── scripts/
    ├── pdf_generator.py        # Main generation script
    └── toc_generator.py        # TOC processing script

Core Components

1. Configuration

# pdf_config.yaml
output:
  path: "output/pdfs"
  filename_template: "{title}-{date}"

pdf:
  page_size: "A4"
  margins:
    top: "25mm"
    right: "25mm"
    bottom: "25mm"
    left: "25mm"

  fonts:
    default: "DejaVu Sans"
    code: "DejaVu Sans Mono"

  headers:
    include: true
    height: "15mm"

  footers:
    include: true
    height: "15mm"
    page_numbers: true

sections:
  - name: "full"
    title: "Complete Documentation"
    content: "/**/*.html"

  - name: "api"
    title: "API Documentation"
    content: "/api/**/*.html"

2. Print CSS

/* print_styles.css */
@media print {
  @page {
    size: A4;
    margin: 25mm;

    @top-center {
      content: string(doctitle);
    }

    @bottom-right {
      content: counter(page);
    }
  }

  /* Hide navigation and UI elements */
  nav, .sidebar, .breadcrumbs {
    display: none !important;
  }

  /* Ensure code blocks don't break across pages */
  pre {
    page-break-inside: avoid;
  }

  /* Add QR code to link back to online version */
  a[href^="http"]:after {
    content: " (URL: " attr(href) ")";
  }
}

3. Implementation

class PDFGenerator:
    def __init__(self, hugo_public_dir: Path, config_path: Path):
        self.hugo_dir = hugo_public_dir
        self.config = self._load_config(config_path)
        self.weasyprint = weasyprint.HTML

    def generate_pdfs(self):
        """Generate PDFs for all configured sections"""
        for section in self.config['sections']:
            self._generate_section_pdf(section)

    def _generate_section_pdf(self, section):
        """Generate PDF for a specific section"""
        # 1. Collect HTML files
        html_files = self._collect_html_files(section['content'])

        # 2. Process HTML
        processed_html = self._process_html(html_files)

        # 3. Add cover page
        final_html = self._add_cover_page(processed_html, section)

        # 4. Generate PDF
        self._generate_pdf(final_html, section['name'])

    def _process_html(self, html_files):
        """Process HTML files for PDF generation"""
        # Combine HTML files
        # Update internal links
        # Add page breaks
        # Process table of contents

4. CLI Interface

# Generate PDFs for all sections
hugo-pdf generate

# Generate PDF for specific section
hugo-pdf generate --section api

# Use custom configuration
hugo-pdf generate --config my_config.yaml

# Override output directory
hugo-pdf generate --output ./my-pdfs

5. Error Handling

class PDFGenerationError(Exception):
    """Base class for PDF generation errors"""
    pass

class HTMLProcessingError(PDFGenerationError):
    """Error during HTML processing"""
    pass

class PDFRenderingError(PDFGenerationError):
    """Error during PDF rendering"""
    pass

Process Flow

  1. Pre-processing

    • Load configuration
    • Validate Hugo output exists
    • Create output directory
  2. HTML Collection

    • Find all HTML files for section
    • Sort in correct order
    • Validate HTML structure
  3. HTML Processing

    • Combine multiple HTML files
    • Update internal links
    • Apply print styles
    • Add cover page and TOC
  4. PDF Generation

    • Convert to PDF using WeasyPrint
    • Add headers and footers
    • Generate bookmarks
    • Save output

Usage Example

from pathlib import Path
from pdf_generator import PDFGenerator

# After Hugo build
hugo_public = Path("websites/my-docs/public")
config_path = Path("config/pdf_config.yaml")

generator = PDFGenerator(hugo_public, config_path)
generator.generate_pdfs()

Dependencies

weasyprint>=54.0
pyyaml>=6.0
bs4>=4.9.3  # For HTML processing

Future Enhancements

  1. Multiple PDF Engines

    • Support for alternative engines (Prince, wkhtmltopdf)
    • Engine-specific optimizations
  2. Advanced Styling

    • Custom fonts
    • Watermarks
    • Page templates
  3. Optimization

    • Parallel processing
    • Image optimization
    • PDF compression
  4. Integration

    • CI/CD pipeline integration
    • Automatic version tagging
    • PDF metadata management

Would you like me to elaborate on any part of this specification or create example code for a specific component?