Behat / Borg

behat.org 2.0. STALLED BECAUSE OF LACK OF TIME
9 stars 1 forks source link

Domain inconsistencies #7

Closed stof closed 9 years ago

stof commented 9 years ago

While reviewing the code, I found a few inconsistencies in the domain.

  1. The case sensitivity of package names is inconsistent:
    • the PHP objects representing it are treating case sensitive
    • the path storage case sensitivity depends of the filesystem =>if you run on a case insensitive filesystem, you will have conflicts between multiple packages for different cases
    • the update of the doc from Github depends of the Github rules => case insensitive in many places (at one point, they were case sensitive when doing a git clone, but we are not using git so we are not impacted)
    • the shortcut redirection in the DocumentationController are using the behat/docs package, but the Github name is Behat/docs
  2. The handling of versions is weird: release can be any semver version, but doc versions are only about minor versions. So if the 3.0.x branch is updated, and then 3.0.15 is tagged on an older commit (because you pushed another commit in the branch too, for instance bumping the version to 3.0.16-dev) just after, the older commit will be rendered. This gets even worse if someone reports you a much older version (someone sending you a webhook payload for a 3.0.0 tag all the time, overwriting the 3.0 doc with the older doc). Having multiple versions mapping to the same Documentation creates this pain (this is especially true for extensions which will have the doc in the same repo than the code, because there will be tags in these repos, while a doc-only repo might not be tagged)
  3. Forcing to have versions looking like v\d++\.\d++ for the documentation prevents building the doc for the master branch of extensions. This is an issue for extensions where the doc and the code are in the same repo, as using master is much easier than v3.0 for the development branch (avoiding to break all existing PR when you add a new feature in the extension making you bump to 3.1)
  4. Package names only allow letters, numbers, underscores and dashes. However, Github allows more than than. For instance, dots are allowed for sure on Github.
  5. The current design forbids people hosting their extension elsewhere than on Gtihub (on Bitbucket for instance) to register it on the website. Even though Github is the most used, there are approx 15% of Packagist packages coming from other sources (mainly Bitbucket) IIRC. And if we allow packages coming from elsewhere, we need to handle package name conflicts between Github and Bitbucket (creating a Behat organization on Github does not reserve the same name on Bitbucket). Scrutinizer handles this with a g/ or b/ prefix in the package name for instance.
everzet commented 9 years ago

Do you have ideas on a solution?

everzet commented 9 years ago

Very good points BTW

stof commented 9 years ago

For the first point, we could decide to make package names case insensitive on our side too (accepting them in any case, and lowercasing them when building filesystem paths or comparing them). However, the question would be about the way to know the canonical case of the version (to build links to other pages and the canonical meta tag with the right case), which should rather match the Github case.

For the version regex (third point), this would work fine for the Documentation object which only cares about a version string (it is a matter of updating the route requirements), but it would be more complex for the Release part, which accepts only semver versions currently (although the code only cares about the minor version to build a ReleaseDocumentationId while other getters are unused, so we might refactor this to avoid requiring semver strictly).

For the second point, the way to solve it will depend on the way we refactor our version handling (and on the way we build the UI to trigger new releases, which is not done yet and so can take this point into account)

for the 4th point (which I just added), it is a matter of changing the package regex to allow dots (and maybe checking what else is allowed on github)

For the 5th point, a solution could be to identify extensions by their Composer package name rather than their github repo name (this also simplifies the case handling because Composer package names are always lowercase, and it might simplify the package naming too because the Packagist regex for package names is public while the Github one is not AFAIK). Even though using github may not be a requirement for extension authors, I think an extension which is not available through Composer should not be accepted. The composer package name would then become the unique identifier of the extension. This would make the package update a bit more complex though, because we would need to ask Packagist for the download URL of the package, or we should keep some metadata about each known package (which is probably a good idea though to be able to link to the extension repo in the website for instance).

everzet commented 9 years ago

Here's what I think on some points:

  1. Lets make package names case sensitive, but use exactly the same casing as people use on the GitHub. Meaning if your repository is actually called Behat/docs, it will be /docs/Behat/docs. This means if people use docs/behat/docs, they'll get 404. This is a bit inconsistent with GitHub, but much more predictable IMO.
  2. My thinking process here would be this: we need to include 3 different types of versions, resulting in 3 different types of documentation urls:

    • v2.0-style branches will result in /docs/v2.0.x url and v2.0.x documentation version. Every new commit to this branch will cause update of v2.0.x docs
    • v2.0.0-style tags will result in /docs/v2.0 url and v2.0 documentation version. Every new tag in v2.0.* family will cause update of v2.0 docs
    • master and develop branches will result in /docs/master and /docs/develop documentation urls. Every new commit to these branches will cause an update of master or develop docs

    We'll show all available doc versions, but the /current (as soon as it is introduced) will link to either of (in priority order):

    1. Latest v2.0-style doc
    2. Latest v2.0.x-style doc
    3. Latest master doc
    4. Latest develop doc
  3. Previous point if we decide to roll with it should fix this point too (if I understood it correctly).
  4. We'll update this thing as we go. I apply same rules here - it's better to be too limiting than too liberating at the very beginning. It's easier to fix a limitation than a security issue :)
  5. This one is interesting and something that I didn't though about much. It seems we'll need to take this one into serious account, not sure how though...

What do you think?

everzet commented 9 years ago

On the 5, I think we can postpone the decision for now as I'm not planning to add Bitbucket support for couple of months at least. We can deal with it when we'll have an issue. I know it's risky, but I believe it's still much better than trying to take every possible edgecase ever into account from the get go. Not sure though...

stof commented 9 years ago

The issue is that if we use URLs based on github usernames, we have to provide BC for them once we introduce Bitbucket support, which will be a pain. So IMO, it is better to think now about what is the unique identifier of an extension (even though we don't add Bitbucket support yet)

everzet commented 9 years ago

@stof true. What's about other points?

everzet commented 9 years ago

Ok, I think I have a neat solution for points 1 & 5 :)

All three problems go away if we simply require extensions to have composer.json (which I think is reasonable). This way we can simply use package name defined in composer.json as a documentation project name. If release doesn't have composer.json - we simply ignore it. This not only solves the problems of naming and introducing Bitbucket repos later, it introduces consistency between borg and packagist, which is a big plus.

Now, this obviously wouldn't work for cases when documentation is hosted in separate repository. Because surely it wouldn't have composer.json (Behat/docs is a good example of that). For such cases we can introduce additional support for a borg.json config file in documentation repos in which you will need to specify the composer package name this documentation is for. Borg will then use this name to generate docs with proper package URL AND link the docs with the particular extension package, which is a plus.

I think it's quite elegant and clear solution for a very tricky problem. What do you think?

ciaranmcnulty commented 9 years ago

I don't think borg.json is required, there's no reason an extension author can't add a composer.json. They don't have to submit it to packagist,

Alternatively, perhaps the composer.json for the main repo can point to the documentation? Perhaps something in the extra key in composer.json?

everzet commented 9 years ago

@ciaranmcnulty asking people to create composer.json in their pure-doc repository having exactly the same package name as their original extension composer.json will confuse the hell out of them.

ciaranmcnulty commented 9 years ago

Hm, ok. How about adding something to composer.json?

{
    "extra" : {
        "borg-repo" : "http://github.com/ciaranmcnulty/myextension-docs"
    }
}
everzet commented 9 years ago

We need package name information inside the documentation repository. Otherwise we introduce strong dependency from the documentation generation/parsing to the extension parsing. With link from extension to documentation we'll somehow need to ensure that extension releases are issued before documentation releases or documentation generation will fail. This is a bad idea, IMO.

To clarify: the way Borg currently works is that you could notify it about documentation release before extension release and it will work just fine. I want to keep it this way. Adding link to documentation into the extension will force us to find and parse extension in order to be able to generate documentation. I don't want this.

ciaranmcnulty commented 9 years ago

In that case, I suggest forgetting about composer.json as it won't be present in all repositories. Your idea for borg.json could be applied everywhere and would simplify things a lot as you'd be in control of the format.

everzet commented 9 years ago

There's no need to ask people for a custom borg.json if documentation is hosted inside the package repository (most of the cases), because composer.json will have all the information we need. This would not be true in very limited set of exceptions and I don't want to force 90% of developers to adapt to 10%.

everzet commented 9 years ago

@stof thoughts on point 2 in https://github.com/everzet/Borg/issues/7#issuecomment-68560817 and the whole https://github.com/everzet/Borg/issues/7#issuecomment-68584368 ?

everzet commented 9 years ago

@ciaranmcnulty what do you think about https://github.com/everzet/Borg/issues/7#issuecomment-68560817 ?

ciaranmcnulty commented 9 years ago

It seems like a good solution if you can manage the overhead of parsing composer.json as well as a borg.json; I'd suggest that borg.json should be very small and just point to the 'real' package.

Using Composer's normalised package names would solve a lot of problems.

everzet commented 9 years ago

yup :)

everzet commented 9 years ago

Working on this one

everzet commented 9 years ago

documentation package name issue is supposedly resolved by #19

stof commented 9 years ago

I agree about your idea of using the composer.json with a fallback to borg.json