eirikb / nipster

Search tool for npm
http://nipstr.com/
126 stars 20 forks source link

Normalize repository URLs #19

Closed kenany closed 10 years ago

kenany commented 10 years ago

npm uses normalize-package-data to allow for defining a package's repository.url in a variety of ways:

{
  "repository": {
    "type": "git",
    "url": "git://github.com/npm/npm.git"
  }
}
{
  "repository": "git://github.com/npm/npm.git"
}
{
  "repository": "npm/npm"
}

These all result in repository.url being "git://github.com/npm/npm.git". Looks nipster doesn't do such normalization, so many packages (i.e. mine :grin:) have a null repository in that npm-datatables.json of your's. I would've sent a PR to integrate normalize-package-data but looks like 1) backend is closed-source for the time-being and 2) it's written in C#, so one would have to port normalize-package-data.

eirikb commented 10 years ago

If I were to throw the whole lot up into a branch, would you make the PR?

kenany commented 10 years ago

Actually looks like the logic is in the gist you posted in #18.

Please correct me if I'm wrong, as my C# is rusty, but it looks to me that the repository URL is the first of the following that GitHubUrl does not return null for:

var url = GitHubUrl(() => v.repository.url)
          ?? GitHubUrl(() => v.repository[0].git)
          ?? GitHubUrl(() => v.repository[0].url)
          ?? GitHubUrl(() => v.repository.git)
          ?? GitHubUrl(() => v.repository)
          ?? GitHubUrl(() => v.homepage)
          ?? GitHubUrl(() => v.bugs.url);

Since you are checking repository.url and repository, all that's missing is to support repository being in GitHub shorthand format:

{
  "repository": "npm/npm"
}

For that, porting github-url-from-username-repo was quite easy:

private static string GitHubUrlFromUsernameRepo(string r)
{
    return Regex.IsMatch(r, @"^[\w-]+\/[\w\.-]+$", RegexOptions.IgnoreCase)
      ? "https://github.com/" + r
      : null;
}

private static string GitHubUrl(Func<dynamic> func)
{
    var u = "" + NoThrow.String(func);
    if (GitHubUrlFromUsernameRepo(u)) {
        u = GitHubUrlFromUsernameRepo(u);
    }
    return Regex.IsMatch(u, "github.com", RegexOptions.IgnoreCase) ? u : null;
}

What do you think?

eirikb commented 10 years ago

Perhaps if (GitHubUrlFromUsernameRepo(u) != null)?

kenany commented 10 years ago

Oh right, too used to that working in JavaScript (evaluating to falsey).

eirikb commented 10 years ago

Does it look better now? I'm not sure what repo you were talking about.
I haven't applied your change though, the original code should work just fine, as it just ends up with "KenanY/node-resteemo" which will be used by the GitHub queue-worker.

However, what did seem to cause the problem was this line from my gist, can't remember why it exists, perhaps some fail-safe in case a repo does not exists, anyway replaced it with a null-check. I probably ought to publish my server-side part to a branch, what do you think?

kenany commented 10 years ago

Yeah everything's perfect now :smiley:. Publish if you'd like.