Homebrew / homebrew-cask

🍻 A CLI workflow for the administration of macOS applications distributed as binaries
https://brew.sh
BSD 2-Clause "Simplified" License
20.93k stars 10.72k forks source link

Remove `tags` #15971

Closed vitorgalvao closed 8 years ago

vitorgalvao commented 8 years ago

tags is pretty damn worthless. The only thing it has is :vendor which is by itself already useless and seldom used. It was supposed to aid in searching, but we’re not a discoverability service, and that definitely threads the line.

My initial point againt them still pretty much stands, without a need for an update. I believe tags to be bloat.

Would like some opinions on this, even if they are “I have no strong feelings one way or the other”.

jawshooah commented 8 years ago

Agreed. Trim the fat!

Amorymeltzer commented 8 years ago

I might define "discoverability" differently than other folks here, but as a negative "brew-cask isn't a discoverability service," I take it to mean no "Top 100 apps of the month/year/all-time," no "Productivity/Game/Finance" categories, and no "Collaborator Picks" stuff. We don't need or want that for all the many, many reasons listed over the years.

The other, positive aspect, which for the sake of this conversation I will dub "searchability," is that if someone knows the software they want it should be very easy to find and install it. Searchability is good and dare I say required for a properly-functioning project[1], especially as the caskroom grows.

To wit, the problem with the tag stanza is that it isn't actually implemented (i.e., in search). I agree with the above consensus about tags, but I think actually think vendor is useful to keep, and should be used far more often[2] as it provides and expands searchability. As a quick example, take silverlight. Someone seeking to install the cask might reasonably[3] expect it to be named microsoft-silverlight[4], and cask search microsoft should help them know what to install.

I say toss tags (not a good/useful service, as per comments above) but keep and replace it with vendor, à la name (only one vendor allowed, though). Truthfully, and regardless of the outcome here, name should probably also get a closer look (I know you're pretty familiar with it @vitorgalvao). It's only useful if it can be searched for, like vendor. The two seem like natural allies.


0: @vitorgalvao I almost just posted that exact video to your final comment in #15799. 1: One place I think we get hung up on here is that there are two main types of brew-cask users/uses. The everyday "oh, I want this application cask install app yay I have it now" and the (also quite personal) case of bootstrapping new installs/machines/setups, etc. The latter is often discussed ("don't break everyone's scripts") but "searchability" is quite critical to the former use and doesn't come up enough. 2: Assuming its integrated and implemented with search, anyway. 3: Users shouldn't have to be experts on tokenization to use brew-cask. 4: See also google-chrome, firefox, and a bunch of Facebook stuff.

adidalal commented 8 years ago

I like @Amorymeltzer's arguments - this also brings up the point of brew cask search being underpowered and thus brew-cask ends up underserving the "everyday" user.

A generic "tags" or categories label seems too far down the end of spectrum of discoverability, but name and vendor should reasonably be covered by search, in my opinion - if only to help users out with finding out that's it's firefox vs mozilla-firefox, etc. Removing everything risks throwing the baby out with the bathwater, so to speak, and may actually decrease user-friendliness, not increase it.

Revamping search will clearly require quite a bit of work, but as some of our other initiatives are also geared towards making homebrew-cask more user friendly in general, it seems like something that's reasonable and to be considered.

vitorgalvao commented 8 years ago

Yes, the idea with both name and tags is that eventually they’d be searchable. I though the implementation of using name for search had already started even and been abandoned, but I can’t find the issue so perhaps I’m mistaken.

I really like your distinction between “discoverability” and “searchability” and agree the latter is important. I also agree with the point that, to keep it, we should get rid of :tags and just have vendor. You actually changed my mind on this.

However, I also think you touched on an important point when you mentioned name, because I think name would be a good place for this. Microsoft Silverlight is a perfectly valid and expected name for the silverlight cask, so why not leverage it for this? Are there any cases where a cask’s name wouldn’t make sense to have the vendor prepended as well, especially since we can have multiple name stanzas?

So (and this is not rhetorical) why not get rid of tags/vendor and in the documentation state instead

The name stanza accepts a UTF-8 string defining the full name of the software, and is used to help with searchability and disambiguation. It can be repeated multiple times if there are useful alternative names, but the first should follow the canonical branding as defined by the vendor.

This is a good place to include the software vendor’s name as well (e.g. pixlr).

And instead of pixlr having

name 'Pixlr'
vendor 'Autodesk'

it would have

name 'Pixlr'
name 'Autodesk Pixlr'

The logic behind this is that since name and tags/vendor have such similarities and even overlaps (think Google Chrome, Adobe Flash, places where the vendor is actually part of the name and is currently just repeated in tags), why not get rid of one of them and let the other carry all the weight?

adidalal commented 8 years ago

So, with the above proposal, search would have to do partial matches, correct? ie, searching for microsoft should bring up office, silverlight, etc...

If so, that would work out (and also simply things - reducing the number of "unique" stanzas is a plus when it comes to maintainability.

Amorymeltzer commented 8 years ago

Yeah, I honestly only figured out while writing this up that name wasn't included in search.

To answer your question @vitorgalvao, I think it depends on how search is implemented. Basically, if search is partial (as it is right now), then why bother repeating anything? The current content of pixlr

name 'Pixlr'
vendor 'Autodesk'

would result in accurate search results for both "pixlr" and "autodesk." Searching for "autodesk-pixlr" or "autodesk pixlr" would presumably fail, but I think it would be sensible for search to include all concatenations of names with vendors.

The reason I'm in favor of this is because of casks that have more than one name. There are 17 with 3 and 106 with 2. Should every name stanza also be duplicated with the correct vendor name? I think it's visually a lot cleaner and a lot clearer (esp. for contributors) if casks have names and 0 or 1 vendor stanzas rather than having 0, 2, 4, or 6 names.[1] There would presumably be more stanzas overall, @adityadalal924, but I think it will be more obvious what should go in and would avoid truly excessive bloat.[2]

google-chome is a good example[1] of why the search function matters. Its name stanza is Google Chrome (I would have expected Chrome), but yet it shows up with

cask search google
cask search chrome
cask search Google Chrome

so is there even a need for either name or vendor? Basically, I think a decision on how this information, whether name, vendor, or tag :vendor, will be used is important for going forward.


1: A lot of those will potentially be changed following this discussion and proper search implementation in the future. google-chrome is a good example, as per above. 2: Technically, the most straightforward reductio ad absurdumm method would be to dump everybody into one single string: name 'name2 vendor vendor-name name2 vendor-name2'

vitorgalvao commented 8 years ago

@adityadalal924

So, with the above proposal, search would have to do partial matches, correct?

Yes.

@Amorymeltzer

Basically, if search is partial (as it is right now), then why bother repeating anything?

Well, indeed that is part of my point . I purposefully gave an example that doesn’t have the vendor in the name, for clarity in the documentation, but many canonically do, perhaps most. Google Chrome, Microsoft Silverlight (whose current cask has name wrong), Adobe Flash, VMWare Fusion all have repetition, because the vendor’s name is part of the canonical name of the app.

Should every name stanza also be duplicated with the correct vendor name?

So my point is that most won’t be duplicated with the vendor name, because the vendor name is already there for many/most. In addition, some of the casks with repeated names already follow a variation of that, with multiple similar names where just an added word differs between them.

There’s no reliable way for us to get meaningful numbers on this without manually checking, since there are too many variations to take into account.

I think it's visually a lot cleaner and a lot clearer (esp. for contributors) if casks have names and 0 or 1 vendor stanzas rather than having 0, 2, 4, or 6 names.

I was also thinking about contributors when I wrote the previous post, as I believe having one less stanza is beneficial for them. My rewrite of the documentation of name left it roughly the same size and gets rid of tags at the same time.

so is there even a need for either name or vendor?

While reading your post, and before reading that part, I was reaching the same conclusion that perhaps a third stanza to replace the other two could be the solution.

Lets think why name was created, then. name was also created precisely because cask tokens are limiting, and we wanted a way to include the canonical name of the app as well. I’m not entirely sure how successful that was. silverlight (name 'Silverlight') and thunderbird ( name 'Mozilla Thunderbird) might actually be wrong currently (screenshots from their websites):

It seems that getting the canonical name of an app might not be as straightforward as initially thought.

I’m also leaning towards keeping name instead of an alternative because it suggests restraint. If we name it something else that immediately suggests serachability, we might start getting irrelevant information there (tags, categories, descriptions).

So my updated proposal would be to still get rid of tags/vendor, and keep name as

The name stanza accepts a UTF-8 string defining the full name of the software, and is used to help with searchability and disambiguation. It can be repeated multiple times if there are useful alternative names.

The software vendor’s name should be included as well (e.g. in pixlr).

And pixlr would link to the specific line (only name stanza) saying name 'Autodesk Pixlr'.

adidalal commented 8 years ago

Checklist for progress (maintainers feel free to update as proposal evolves)

Amorymeltzer commented 8 years ago

Tokenization will definitely need a serious change, then. As it is, using the .app for the token is easy to follow, and involves few judgement calls. (Retracted, see below)

I suppose I'm trying to consider the cases of name where something actually goes by multiple names. ynab is an example, but the many foreign-language casks are probably better. To clarify, under your rewrite, would cave-story be read as such?

name 'Pixel Cave Story'
name 'Pixel Doukutsu'
name 'Pixel 洞窟物語'
adidalal commented 8 years ago

@Amorymeltzer That looks quite ugly, and pushes me back towards just having a :name and a :vendor stanza and seeing where things go from there - that's less of a major change and more of "let's remove this generic "tag" thing and put in something with an actual purpose and see how things go"

The actual change (Cask wise) would also not be as drastic - should be a simple substituion for the Cask files, and the core code for :vendor can be copied wholesale from :name, save for the fact that "only one" vendor requirement should be enforced.

Thoughts?

vitorgalvao commented 8 years ago

Tokenization will definitely need a serious change, then.

Why?

As it is, using the .app for the token is easy to follow, and involves few judgement calls.

And it’ll stay that way. name is and always was freeform. To be clear, name exists because tokenisation has necessary restraints. Nothing changes there.

Should’ve been more specific regarding your example, though. From what I was thinking, the vendor’s name would need to appear no more than once, and simply when it makes sense. Does cave-story need a the vendor, even? If anyone wants Cave Story, having “Pixel” there (whether in name or vendor) isn’t that helpful anyway, it’s not like you’re going to find it (or look for it) through its vendor. Same with scrolls. In their website, you can’t even quickly guess they’re a Mojang game, neither should you care. To find it, you should search for “Scrolls”. If you search for “Mojang” in the hopes of “that mojang game”, or “show me all mojang games”, then you are going for discoverability (“show all casks that share this”).

Your point on searchability is the most important thing, here. If a name helps a cask be more easily findable when you know what you want, it should be included. Otherwise, it should be left out. Searches should be “I want that app and am using the vendor in my search because it helps identify it with certainty”, never “I want to see which apps that company released”.

Casks with foreign names are a minority, and though we need to accommodate them, they should not dictate the rule. By having only name and making the rule that “it should be as verbose as possible by including the vendor’s name”, the large majority of casks will end up with a single name stanza, and I see that as a big win. There should be no need for more than one most of the time.


To summarise, what we need to be clear on is name exists for searchability and disambiguation. That’s it. Any connected string that can be considered a name that helps to identify the app, shall be used.

So yes, we can have

name 'Pixel Cave Story'
name 'Doukutsu'
name '洞窟物語'

or

name 'Cave Story'
name 'Doukutsu'
name '洞窟物語'

Either is fine. As long as we point to a good example, I don’t think there’d be much confusion (not more than with other stanzas).

Amorymeltzer commented 8 years ago

Regarding my comment on tokens, I realize now that I misunderstood your earlier point, interpreting "canonical name" to mean "token." Sorry 'bout that, my bad.

Thank you for clarifying, @vitorgalvao, I suppose I quite agree with you and take your point. I suppose the need for name to include "the canonical branding as defined by the vendor" will inevitably leave differences we cannot control ("Google Chrome" V "Thunderbird"), and your proposal allows that most easily.

As long as we're redefining name, though, in my utopia it would be entirely freeform, and not bound by the provider's marketing decisions; that's why I think the setup of search is important. It should be case-insensitive, allow partial matches, and leverage token and names, spitting out what it finds. I imagine our (ever-growing) list of examples as such:

Token vendor and/or name stanzas
google-chrome Unneeded
firefox vendor 'Mozilla' name 'FF'1
thunderbird vendor 'Mozilla'
silverlight vendor 'Microsoft'
pixlr vendor 'Autodesk'
cave-story vendor 'Pixel' name 'Doukutsu' name '洞窟物語'

This way someone searching for "mozilla firefox," "mozilla thunderbird," "microsoft silverlight," "pixlr autodesk," "pixel cave story," "and pixel doukutsu," which are all valid brandings, would be easily satisfied with a minimal number of stanzas.


1: Probably not necessary, but it serves the example and isn't out of the question

vitorgalvao commented 8 years ago

name, though, in my utopia it would be entirely freeform, and not bound by the provider's marketing decisions (…) [search] should be case-insensitive, allow partial matches, and leverage token and names

We are in complete agreement.

As for the Unneeded case, I’d still like to have name as mandatory. I remember that was discussed at the start of name and the agreement was “saying a cask doesn’t need name because its token is exact takes more work than just having name be a repeat of token”. Having name is cohesive, it states “this cask is identified”.

Taking your table, this what I envision:

token name stanzas
google-chrome Google Chrome
firefox Mozilla Firefox
thunderbird Mozilla Thunderbird
silverlight Microsoft Silverlight
pixlr Autodesk Pixlr
cave-story Pixel Cave Story Doukutsu 洞窟物語
adidalal commented 8 years ago

:+1: to @vitorgalvao's table above - that looks like I was envisioning

Amorymeltzer commented 8 years ago

As for the Unneeded case, I’d still like to have name as mandatory.

Ah well, that is where we differ! I consider many names to be pointless. But I concede the point; if name is to be added to the required stanza list, then your proposal is both sound and proper.

One final nitpick since vendor is to be tossed: which name gets the vendor? cave-story is an example, as are pycharm-ce and pycharm-edu. Whichever one duplicates the token?

vitorgalvao commented 8 years ago

One final nitpick since vendor is to be tossed: which name gets the vendor? cave-story is an example, as are pycharm-ce and pycharm-edu. Whichever one duplicates the token?

On the technical side it doesn’t really matter since the best course of action here is likely to jumble up all names together and perform partial matches.

On the nitpick side (which I also subscribe to) I’d say it makes sense for us to have suggestions, not rules. The best place for both cases (duplicate token and include vendor) is the first position.

With these new proposals, it no longer makes sense to have so much duplication, so I can see the suggestion as “the first position should be the most verbose name that still makes sense”. So we no longer have

name 'PyCharm'
name 'PyCharm Community Edition'
name 'PyCharm CE'
vendor 'JetBrains' # not actually in the cask, but included here to illustrate the point

but instead

name 'Jetbrains PyCharm Community Edition'
name 'PyCharm CE'

Jetbrains PyCharm Community Edition makes sense, even though it is likely never referred as such anywhere, but Jetbrains PyCharm Community Edition CE doesn’t make sense, hence why there is a second line for CE.

vitorgalvao commented 8 years ago

If we’re all in agreement, I’ll change the label for the issue accordingly. Please tell me if we’re not, so we can revert it back as a discussion.

adidalal commented 8 years ago

Yep, all good here. https://github.com/caskroom/homebrew-cask/issues/10997 is also related to this.

vitorgalvao commented 8 years ago

First draft for new name rules.

Amorymeltzer commented 8 years ago

:+1: