LuteOrg / lute-v3

LUTE = Learning Using Texts: learn languages through reading.
https://luteorg.github.io/lute-manual/
MIT License
486 stars 46 forks source link

Add "bulk import book" ability #78

Open jzohrab opened 10 months ago

jzohrab commented 10 months ago

Requested from Mycheze and another Discord user.

It would be nice to be able to "bulk import" stuff from a given folder. For example, given a folder full of text files, all files could be imported, and the filenames become the book titles. Perhaps auto-associate audio files, provided the files to be imported follow some kind of naming convention (e.g. in the same folder as the associated .txt file or whatever).

This might be a long-running job and so would likely need to be a command-line job to not run into weird web server problems. Who knows?

jzohrab commented 5 months ago

My suggestion for this, per recent chat on Discord:

flask --app lute.app_factory cli import_books --from /path/to/books
jzohrab commented 5 months ago

Working sample code to load a directory

Attached is a file from rivabem in Discord.

directory_import.py.txt

Further discord notes:

The main problem I think I someone already brought to light in another thread is using data directly from WTForms. For instance I could not use function book.service.get_file_content() because it receives filefielddata object and not string/raw bytes directly. As I had only text files, a mere file.read() solved it, but it should support the same file types as the front-end new book form, same for saving audio files. Another idea was to do a backup before import. Also receives the request object. Would require a little more processing than I was in the mood to do. The major inconvenience of that is redundant code and the risk of outdated code doing something bad to the data. But it seems not to be a huge (in terms of difficulty, not effort) refactoring, mostly have the routes binded to "preparation functions" and those call "execution functions" using just the raw data extracted.

Stub out API layer

I asked ChatGPT to stub out an "import" library that could be re-used:

# file_handler.py
import os
from pathlib import Path

UPLOAD_DIR = "uploads"

def save_file(file_path: str, destination: str = UPLOAD_DIR):
    """
    Save the given file to the specified destination directory.

    :param file_path: Path to the file to be saved.
    :param destination: Directory where the file will be saved.
    """
    if not os.path.exists(destination):
        os.makedirs(destination)

    file_name = os.path.basename(file_path)
    dest_path = os.path.join(destination, file_name)

    with open(file_path, 'rb') as src_file:
        with open(dest_path, 'wb') as dest_file:
            dest_file.write(src_file.read())

    return dest_path

def save_files_from_directory(source_dir: str, destination: str = UPLOAD_DIR):
    """
    Save all files from the given source directory to the destination directory.

    :param source_dir: Path to the source directory.
    :param destination: Directory where the files will be saved.
    """
    saved_files = []
    for root, _, files in os.walk(source_dir):
        for file in files:
            file_path = os.path.join(root, file)
            saved_path = save_file(file_path, destination)
            saved_files.append(saved_path)

    return saved_files

Use it in Flask app:

from flask import Flask, request, jsonify
from file_handler import save_file

app = Flask(__name__)

@app.route('/upload', methods=['POST'])
def upload_file():
    if 'file' not in request.files:
        return jsonify({"error": "No file part"}), 400

    file = request.files['file']
    if file.filename == '':
        return jsonify({"error": "No selected file"}), 400

    file_path = os.path.join('/tmp', file.filename)
    file.save(file_path)

    saved_path = save_file(file_path)
    return jsonify({"saved_path": saved_path}), 200

if __name__ == '__main__':
    app.run(debug=True)

Use it in CLI:

import argparse
from file_handler import save_files_from_directory

def main(source_directory):
    saved_files = save_files_from_directory(source_directory)
    for file in saved_files:
        print(f"Saved file: {file}")

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Upload files from a directory.')
    parser.add_argument('source_directory', type=str, help='Path to the source directory')

    args = parser.parse_args()
    main(args.source_directory)
jzohrab commented 5 months ago

Next steps: