datasette / datasette-extract

Import unstructured data (text and images) into structured tables
Apache License 2.0
129 stars 3 forks source link

`datasette extract` CLI command #13

Open simonw opened 3 months ago

simonw commented 3 months ago

So you can run extract on the CLI in addition to through the web UI.

simonw commented 3 months ago

I'm deleting the prototype code for this for the moment:


@click.command()
@click.argument(
    "database",
    type=click.Path(file_okay=True, dir_okay=False, allow_dash=False),
    required=True,
)
@click.argument("table", required=True)
def extract(database, table):
    click.echo("Will extract to {} in {}".format(table, database))

@hookimpl
def register_commands(cli):
    cli.add_command(extract, name="extract")

And test_cli.py:

from datasette.cli import cli
from click.testing import CliRunner

def test_extract_command():
    runner = CliRunner()
    result = runner.invoke(cli, ["extract", "database", "table"])
    assert result.exit_code == 0
    assert result.output == "Will extract to table in database\n"
hcarter333 commented 2 months ago

When this enhancement exists, it will fix the issue I'm seeing. I'm parsing through the General Exam Ham Radio Question Pool.

ChatGPT keeps stopping for a variety of reasons. Then,

  1. I delete the input its already processed.
  2. I start it again
  3. More questions are parsed correctly into the database till the next stop

With the CLI, I could turn 1 - 3 into a loop and crank along :)