Better CLI API - Githubissues

barjin commented 1 month ago

The current state of the Apify CLI API is dubious. As a user, I’m always doubting the difference between apify run and apify call, whether apify create does something locally or remotely, why does apify actor:get-input exist… etc.

I also like to think about things in hierarchies. For any system with more than 3 features, hierarchy is imo everything. Even Apify (Console) has it - there are separate tabs for Actors, Tasks, Proxy settings, Storages etc.

The current CLI API doesn’t really reflect this.

obrazek

If you look at any other CLI tool that works with similar resources, they are miles ahead - see docker / podman, AWS CLI etc.

Some (incomplete) examples / ideas:

Actor-related commands
- apify actor info
  - reads the .actor file and logs info
- apify actor create --template=[] [name]
  - initializes an Actor template in the folder of the name
- apify actor init [name]
  - just like actor init now
- apify actor push
- apify actor pull
- apify actor run [--remote] [--input=INPUT.JSON] [actor id]
  - run and call combined (switch with the --remote flag or by setting the actor_id )
- apify actor ls
  - Lists all the available users’ Actors
- apify actor rm [actor id]
  - Removes the Actor from the platform
- apify actor build [actor id]
  - Rebuilds the actor on the platform
- ...and more
Task-related commands
- apify task ls
  - Lists all the available user’s Tasks
- apify task rm [actor id]
  - Removes the Actor from the platform
- apify task schedule [task id] [cron string]
  - Schedules the task?
- apify task create / add / …?
- ...and more
Run-related commands
- apify run ls [--active|finished|aborted|...] [--actor-id=id]
- apify run rm [run id]
- apify run attach [run id]
  - probably only attaches the stdout to the users terminal, I don't think we can do stdin. Still, it would be cool!
- apify run resurrect [run id]
- apify run abort [run id]
  - wanna e.g. stop all running runs? how about Docker-style apify run abort $(apify run ls --active -q)?
Storage-related commands
- KVS
  - apify kvs create [name]
    - Creates a named KVS
  - apify kvs ls
    - lists available kvs’s
  - apify kvs ls [kvs id]
    - lists the contents of a given KVS
  - apify kvs rm [name]
  - apify kvs rename [name]
    - you guessed it
  - apify kvs set [--bucket-id=ID] --key=[KEY] value
    - basically like aws’s S3 put-object
    - i.e. `apify kvs get --key="INPUT.json"
      - alternative for the current actor:get-input
  - apify kvs get [--bucket-id=ID] --key=[KEY]
    - basically like aws’s S3 get-object
  - ...and more
- Dataset
  - apify dataset create / ls / rm / rename
  - apify dataset get [--limit] [--offset] [--format=(json|csv|xml|...)] [dataset-id]
  - apify dataset push [--dataset-id=[id]] value
RQ? Proxies? Real-time stats for the current platform usage?

IMO this would send the CLI usability through the roof, inviting actual power users to use us through the command line. Also, most of the commands would just be straight API calls (and none of them clash with the current ones, so no breaking changes, only deprecating the old commands).

CC @B4nan @jancurn @vladfrangu what do you think?

jancurn commented 1 month ago

Indeed, CLI would need more love and has a lot of opportunities to improve the DX!

Two points from my side:

All development needs to be consistent with Actor whitepaper https://github.com/apify/actor-specs
I asked @netmilk to take over the roadmap of CLI with a goal to improve the developer experience, so please align with him on this

netmilk commented 1 month ago

Thank you for a great proposal, @barjin! Looking at it, I have several thoughts to kick-off the conversation with:

Do you intent to remove the apify vis command? That is a useful functionality I have benefited from. apify validate namespace and than input-schema sub-command maybe?
Regarding the current CLI design, my biggest confusion is there isn't clear expectation setting whether the action is going to be performed locally or remotely. Shouldn't it be addressed ? What expectation should it default to for all commands? I'd be very specific in every command description whether it's local or remote. I'm almost inclined almost everything should default to local, except run and task namespace for obvious reasons. :)
I'm thinking apify actor execute or apify run create instead of actor run. I'd suggest preventing the overload of the term Run. The Run (noun) is and Actor executed in the Apify Platform.
What about the apify build namespace? I think the attach or ssh command would be super helpful here during the development as well. When the build fails, I'd love to have a window I could connect to it and debug it remotely. (the docker run --rm -it --entrypoint bash <image:tag> analogy)

jancurn commented 1 month ago

Just fyi, in Actor Whitepaper and SDK/clients, we took care to use consistent naming for "runs", "call", "start" etc. It's important for CLI to be consistent with this too.

vladfrangu commented 1 month ago

Do you intent to remove the apify vis command? That is a useful functionality I have benefited from. apify validate namespace and than input-schema sub-command maybe?

No plans to remove any commands, max re-organize existing ones and add missing ones.

Not sure if it makes sense to have a validate scope... Ideally it'd also go into the actor scope imo

Regarding the current CLI design, my biggest confusion is there isn't clear expectation setting whether the action is going to be performed locally or remotely. Shouldn't it be addressed ? What expectation should it default to for all commands? I'd be very specific in every command description whether it's local or remote. I'm almost inclined almost everything should default to local, except run and task namespace for obvious reasons. :)

When it comes to actors, everything is on platform except apify run (and some other unrelated commands). I can definitely see why Actor Run would cause confusion. I'll need to recheck if the whitepaper covers this

I'm thinking apify actor execute or apify run create instead of actor run. I'd suggest preventing the overload of the term Run. The Run (noun) is and Actor executed in the Apify Platform.

Both of this still have the issue of "where does this run". apify call makes the actor run on the platform. apify run runs the actor locally as if it was on platform. apify actor execute sounds like platform to me, same with apify run create. Maybe apify local run would make more sense? cc @jancurn

What about the apify build namespace? I think the attach or ssh command would be super helpful here during the development as well. When the build fails, I'd love to have a window I could connect to it and debug it remotely. (the docker run --rm -it --entrypoint bash <image:tag> analogy)

+1 to build namespace, definitely if we want to cover the versioning actor part of our api.

But attach/ssh are features that the platform (to my knowledge) don't have right now, and that I doubt will come... At max, maybe we should have a command that simulates an actor build locally? (so all the steps the platform would do to build an image), but that requires extra setup from users (Docker, etc).

barjin commented 4 weeks ago

Thank you for a great proposal, @vladfrangu!

Touché, but I'll let it slip for now 😄

run is overloaded

That's true - my motivation behind apify actor run were all the CLI tools I've used in the past 5 years (Docker, Go compiler, Cargo) - they all have the xxx run command that... well, runs stuff. Imo it would be a shame if we had to go with something like execute - which, e.g. in Docker, has different semantics. (actor execute also sounds like something from the USSR's Great Purge period (: )

It also made me think - apify run currently runs the Actor locally (in line with all the other CLI tools above), so having all the apify run ls / apify run rm (which would list the "Run instances" on Platform) might get confusing for some.

It's a real pickle, but I still think that apify actor run is the cleanest way out.

whitepaper

We'll need to support actor call for calling Actors on the platform by name - all our other tools do have that. Maybe it's not that much of a problem, though:

apify actor run could run the current Actor locally (without options) or remotely (with, let's say, --remote flag).
- (!!!) If run remotely, we would need to check whether it needs to be rebuilt on the platform (and build it if needed). That way, the user wouldn't have to run apify actor push && apify actor call [actor name] with every local change, just apify actor run --remote and wait and watch. (we could always force this by apify actor run --remote --force-build)
apify actor call --input ... [actor name] would just find the Actor by name on the platform and run it (useful if you just want the data scraped by a third party Actor).

build namespace

Sounds good to me, one small thing - if the build fails (even in Docker), you cannot really attach to anything, right? Having some sort of ssh to Actor would be sick, but probably not doable right now as @vladfrangu mentions... But we're still exposing that one http port... maybe Cloud console alá GCP? This would be hard to standardize, though. For starters, I would be happy with just the Actor's stdout being redirected to my (local) terminal.

jancurn commented 4 weeks ago

I suppose apify actor run and apify actor call is fine and the latter consistent with Actor.call in the Apify SDK. But we need to keep apify run and apify call for backwards compatibility anyway :)

jancurn commented 2 weeks ago

For reference, working document is now at https://www.notion.so/apify/New-CLI-Design-a8751a53896e472a9c8f474669f6f5d5

apify / apify-cli

Better CLI API #554