Closed EricLiclair closed 1 year ago
@EricLiclair That sounds like a great idea - would you mind if we hold off until the API is a bit more solidified and then I will reach out to you here on this thread?
@EricLiclair this is great, thank you!
Would you mind building this with typer instead of click? It should be (almost) as simple as :s/click/typer/g
Typer gives us a few extras without being much heavier that, since we're starting from scratch, I'd love to get out of the box (type hints for validation, auto cli docs, cli autocompletion).
Hey, @Jacobsolawetz I don't mind at all and sure. 😊
Hi, @Ben-Epstein, Insightful inputs. I've added an exact replica of the previous implementation using typer. With a little better understanding of the api structure there's always room for a better implementation.
An example impl. for variables as command arguments, etc. Nonetheless, let me know if you'd need anything else
@cli.command()
def retrieve(model_name: str = typer.Argument(..., help="Model name")):
"""Retrieve from API"""
dalm = DALM(name=model_name)
typer.echo(
typer.style(
f"You are now configured with Dalm - model {model_name}",
fg=typer.colors.GREEN,
)
)
...
which would be used like:
@EricLiclair fantastic! That's exactly what I was planning on doing next week :)
I think that we'll likely want to support passing a text file or jsonl, for commands like arcee upload context new_context -f my data.jsonl
But that doesn't need to be now, getting a solid design in place is great for now.
This is perfect.
Hey @EricLiclair Thanks for updating after our merge yesterday. I think to get this CLI to a really useful spot, we need 3 more things.
upload contexts
so that a user can upload many files at once (it can call [upload_docs](https://github.com/arcee-ai/arcee-python/blob/main/arcee/api.py#L35)
). It can take a directory to files, and load them all and uploadarcee train
command, so we can close the loopfor (3), something like
## Using the CLI
```shell
# upload context data to arcee
arcee upload contexts context1 --file /path/to/documents
arcee train dalm1 --context context1
# ... wait for training to complete
arcee retrieve --name dalm1 --query "what is the capital of Washington State?"
arcee generate --name dalm1 --query "what is the capital of Washington State?"
\```
(i think you also need to add typer
to our pyproject.toml dependencies)
We need an upload contexts so that a user can upload many files at once (it can call upload_docs). It can take a directory to files, and load them all and upload
I've modified the upload command to take file or dir paths (even multiple) and call the api. If a single --file is used, it calls upload_doc
. In all other cases, it calls upload_docs
. Let me know if we explicitly need different upload commands.
A arcee train command, so we can close the loop
I have added the train command as you mentioned to close the loop.
a simple section in the readme showcasing an easy way to achieve an e2e train
followed the structure of the existing readme, added a section Using the Arcee cli
(I think you also need to add typer to our pyproject.toml dependencies)
I left it as is since your commit had it commented. I have fixed it now.
What's in the #commit ?
upload_doc
or upload_docs
basis the number of files passed as the cli options. ❗Valid file extensions - .txt
or .jsonl
cli_handler.py
. A few concerns -
.txt
or .jsonl
@EricLiclair
iteratively hit the api with data from k files at a time, k < n but does the service allow us to upload multiple docs to the same context and merge them?
Yes this is perfect. And yes we support uploading multiple docs to the same context!
File validations. What files are valid? currently validating if extension is one of .txt or .jsonl
Thats perfect!
Hey, @Ben-Epstein Apologies for the delayed commit. Have updated the upload logic to check file sizes and upload in chunks/buffers; defaults to 512 MB.
I'm considering the implementation of a CLI (Command-Line Interface) for quick validation purposes. The envisioned CLI would be utilized as follows:
The primary use case for this CLI is to interact with a generative language model. While I initially had some uncertainty regarding the input format, it appears that using a string as input should suffice.
I would greatly appreciate your thoughts and feedback on this proposal, @Jacobsolawetz. Please let me know if you have any suggestions or concerns.