💻 : arcee cli - Githubissues

EricLiclair commented 1 year ago

I'm considering the implementation of a CLI (Command-Line Interface) for quick validation purposes. The envisioned CLI would be utilized as follows:

pip install arcee-py

# To generate content
arcee generate

# To retrieve content
arcee retrieve

The primary use case for this CLI is to interact with a generative language model. While I initially had some uncertainty regarding the input format, it appears that using a string as input should suffice.

I would greatly appreciate your thoughts and feedback on this proposal, @Jacobsolawetz. Please let me know if you have any suggestions or concerns.

Jacobsolawetz commented 1 year ago

@EricLiclair That sounds like a great idea - would you mind if we hold off until the API is a bit more solidified and then I will reach out to you here on this thread?

Ben-Epstein commented 1 year ago

@EricLiclair this is great, thank you!

Would you mind building this with typer instead of click? It should be (almost) as simple as :s/click/typer/g

Typer gives us a few extras without being much heavier that, since we're starting from scratch, I'd love to get out of the box (type hints for validation, auto cli docs, cli autocompletion).

EricLiclair commented 1 year ago

Hey, @Jacobsolawetz I don't mind at all and sure. 😊

Hi, @Ben-Epstein, Insightful inputs. I've added an exact replica of the previous implementation using typer. With a little better understanding of the api structure there's always room for a better implementation.

An example impl. for variables as command arguments, etc. Nonetheless, let me know if you'd need anything else

@cli.command()
def retrieve(model_name: str = typer.Argument(..., help="Model name")):
    """Retrieve from API"""

    dalm = DALM(name=model_name)
    typer.echo(
        typer.style(
            f"You are now configured with Dalm - model {model_name}",
            fg=typer.colors.GREEN,
        )
    )
...

which would be used like:

Ben-Epstein commented 1 year ago

@EricLiclair fantastic! That's exactly what I was planning on doing next week :)

I think that we'll likely want to support passing a text file or jsonl, for commands like arcee upload context new_context -f my data.jsonl

But that doesn't need to be now, getting a solid design in place is great for now.

This is perfect.

Ben-Epstein commented 1 year ago

Hey @EricLiclair Thanks for updating after our merge yesterday. I think to get this CLI to a really useful spot, we need 3 more things.

We need an upload contexts so that a user can upload many files at once (it can call [upload_docs](https://github.com/arcee-ai/arcee-python/blob/main/arcee/api.py#L35)). It can take a directory to files, and load them all and upload
A arcee train command, so we can close the loop
a simple section in the readme showcasing an easy way to achieve an e2e train

for (3), something like

## Using the CLI

```shell
# upload context data to arcee
arcee upload contexts context1 --file  /path/to/documents
arcee train dalm1 --context context1
# ... wait for training to complete
arcee retrieve --name dalm1 --query "what is the capital of Washington State?"
arcee generate --name dalm1 --query "what is the capital of Washington State?"
\```

(i think you also need to add typer to our pyproject.toml dependencies)

EricLiclair commented 1 year ago

We need an upload contexts so that a user can upload many files at once (it can call upload_docs). It can take a directory to files, and load them all and upload

I've modified the upload command to take file or dir paths (even multiple) and call the api. If a single --file is used, it calls upload_doc. In all other cases, it calls upload_docs. Let me know if we explicitly need different upload commands.

A arcee train command, so we can close the loop

I have added the train command as you mentioned to close the loop.

a simple section in the readme showcasing an easy way to achieve an e2e train

followed the structure of the existing readme, added a section Using the Arcee cli

(I think you also need to add typer to our pyproject.toml dependencies)

I left it as is since your commit had it commented. I have fixed it now.

What's in the #commit ?

Rebase
modified upload context command to call upload_doc or upload_docs basis the number of files passed as the cli options. ❗Valid file extensions - .txt or .jsonl
separated functionality of upload in cli_handler.py.
Added a spinner progress for upload command.
updated readme.

A few concerns -

Loading multiple files might cause memory errors and thus would kill the task. I did try it by creating a lot of large dummy files. Any thoughts on this? I thought of a buffer based upload (iteratively hit the api with data from k files at a time, k < n) but does the service allow us to upload multiple docs to the same context and merge them?
File validations. What files are valid? currently validating if extension is one of .txt or .jsonl

Ben-Epstein commented 1 year ago

@EricLiclair

iteratively hit the api with data from k files at a time, k < n but does the service allow us to upload multiple docs to the same context and merge them?

Yes this is perfect. And yes we support uploading multiple docs to the same context!

File validations. What files are valid? currently validating if extension is one of .txt or .jsonl

Thats perfect!

EricLiclair commented 1 year ago

Hey, @Ben-Epstein Apologies for the delayed commit. Have updated the upload logic to check file sizes and upload in chunks/buffers; defaults to 512 MB.

arcee-ai / arcee-python

💻 : arcee cli #2