curl / trurl

trurl is a command line tool for URL parsing and manipulation.
https://curl.se/trurl/
Other
3.1k stars 99 forks source link

Get path components by index / json array #291

Open elig0n opened 3 months ago

elig0n commented 3 months ago

Where something like: trurl -g {path:2} would get the 2nd slash separated element of a URL path.

The slash separated elements can also take a form of array in the JSON output. Making it easier for parsing.

bagder commented 3 months ago

I presume that would make the JSON output perhaps look like something like this?


$ trurl --json curl.se/1/2/3
[
  {
    "url": "http://curl.se/one/two/three",
    "parts": {
      "scheme": "http",
      "host": "curl.se",
      "path": "/one/two/three"
    },
    "path": [
       "one",
       "two",
       "three"  
    ]
  }
]
elig0n commented 3 months ago

@bagder Sure why not? I'll leave the implementation specifics up to the developers.

This can also apply to the "host" part i.e. have it split into: domain, subdomain, tld

bagder commented 3 months ago

This can also apply to the "host" part i.e. have it split into: domain, subdomain, tld

I suppose that would then be a "host" array since it can in theory contain a large number of parts. A reverse-sorted list perhaps so that it starts with the TLD?

(I just want to be clear that I'm not entirely convinced trurl needs these features, but I'm testing out the ideas and how they would work as a process to making up my mind.)

dfandrich commented 3 months ago

If you do offer split URLs, supporting an additional form of splitting by PSL might also be useful in some cases (such as the PSL suffix in one part and the rest in another). But, libcurl doesn't give you that so it would need to be done by trurl itself.

elig0n commented 3 months ago

I suppose that would then be a "host" array since it can in theory contain a large number of parts. A reverse-sorted list perhaps so that it starts with the TLD?

Maybe just another sub-object with key-value pairs would suffice

bagder commented 3 months ago

Since a path is always separated by slashes and a host name is always separated by periods, I don't quite see the need for trurl to that that splitting. There are plenty of help in tools and languages to split a string by a given separator.

As @dfandrich mentions, getting a PSL out of the host name would be different - but that would require either a API change in libcurl or that trurl accesses libpsl itself. Not something I personally feel is worth it.

elig0n commented 3 months ago

Why would you defer the job of splitting paths in the JSON trurl generates to the user who runs i.e. jq ? They should only care about extracting the data they need and not parsing. A common Unix philosophy says: do one thing and do it well.

emanuele6 commented 3 months ago

A common Unix philosophy says: do one thing and do it well.

The entire point of that is to not do overly task-specific things in your tools so that the tool can be more generic and it is possible to use external tools to easily integrate it in many different complex applications. I don't understand how else you are interpreting that saying.