abinthomasonline / repo2txt

Web-based tool converts GitHub repository contents into a single formatted text file
https://repo2txt.simplebasedomain.com/
MIT License
867 stars 83 forks source link

Proposal: Python bindings or standalone version for repo2txt? #1

Closed shivvor2 closed 1 month ago

shivvor2 commented 1 month ago

Hi there! πŸ‘‹

I'm really interested in using repo2txt in a Python environment. I was wondering if you'd be open to either Python bindings for the existing JS functions in script.js, or perhaps a standalone Python version?

Here's a rough idea of how I envision using it:

import os
from repo2txt import repo2txt  # hypothetical import

repo_url = "https://github.com/abinthomasonline/repo2txt/"
github_token = os.getenv("GITHUB_ACCESS_TOKEN")

# Filters structured like .gitignore files
with open("filters.txt", "r") as f:
    filters = f.readlines()

combined_text = repo2txt(repo_url, github_token, filters)

# combined_text would look something like this:
"""
Directory Structure:

└── ./
    β”œβ”€β”€ index.html
    └── script.js

---
File: /index.html
---

<!DOCTYPE html>
<html lang="en">
<head>
# ... rest of the content
"""

# Then we could use combined_text for further processing

What do you think? Does this align with the project's vision? I'm happy to contribute if you're interested in this direction!

abinthomasonline commented 1 month ago

I'm all for python bindings, please go ahead.

Also checkout https://github.com/abinthomasonline/repopack-py, similar tool, but requires the code to be present locally. You can repurpose some output formatting code from it though.

shivvor2 commented 1 month ago

Hi there! πŸ‘‹

I wanted to follow up on my previous suggestion about Python bindings for repo2txt. After exploring this idea, I've encountered some challenges that I think are worth sharing.

The current implementation of repo2txt is deeply integrated with browser-specific APIs and the DOM, which makes creating Python bindings without modifying the original JavaScript quite challenging.

Here's what I've tried:

  1. Using MiniRacer to run the script in Python: This approach ran into issues with browser-specific objects like document not being available. Here's a snippet of the error:

    py_mini_racer._types.JSEvalException: <anonymous>:1: ReferenceError: document is not defined
    document.getElementById('repoForm').addEventListener('submit', async function (e) {
    ^
    
    ReferenceError: document is not defined
       at <anonymous>:1:1
  2. Creating mock browser objects: While this could potentially work, it would require extensive mocking of browser APIs, which feels like it might be beyond the scope of simple bindings.

  3. Using Pyppeteer: This approach could work by running a headless Chromium instance, but it feels quite heavyweight for what should ideally be a lightweight library operation.

Given these challenges, I think this approach might not be the most practical, and it would be better to close the issue

Thank you for your openness to suggestions and for maintaining the project. It's been a great learning experience, and I appreciate the chance to contributeπŸ™‡β€β™‚οΈ