Closed paulmelnikow closed 5 years ago
re: point 1, this is my rough assessment of where you are right now:
Update: List moved to https://docs.google.com/spreadsheets/d/1cHIUSVaiKrIFw3KIu0yt-EMNlMkIfU5alE7YKZ4PeOE/edit#gid=0
Note that I've said an integration 'has tests' if there is a file in /service-tests
with a corresponding name, but there is probably some further nuance to consider here because some integrations may not have good coverage of all paths or sub-endpoints. For example if we look at wordpress, you've got "happy path" tests but no coverage of failure or error conditions.
I'm happy to continue picking through services in a sort of ad-hoc way and submitting PRs in the style of #1315 and #1354 which add tests and fix any problems found while adding the tests. However, having made this table and seen the scale of the issue, I have some thoughts on managing this:
There are a lot of services without tests. Obviously it would be nice to add tests for all of them, but its probably worth doing some kind of triage. Do you have any stats we can use to identify services with high levels of usage but no test suite? It would probably be sensible to focus there first. You could pick maybe 10 or 20 services which are high-priority to add tests for and then raise issues for them as a starting point. That would also allow people interested in working on tests for a service to post in the relevant issue so we don't accidentally duplicate work.
Thanks for the awesome list!
… some integrations may not have good coverage of all paths or sub-endpoints.
Right, this is a good point. It's worth noting is that the new service handlers will use async
/await
and promises, which compared to continuation-passing, delivers more reliable erroring. Since we can enforce 100% code coverage on the new services, we’ll also be able to refactor at will across the services. That should let us extract and generalize the error handling code, and test it once, instead of on each service. A good example of this is the "inaccessible" case when request
returns an error object.
One path we should always test is “not found”. Though, few of the badges have implemented a good “not found” path, so those will have to be added along with the tests. Changes like those are welcome, unless we decide to freeze the badge implementations at some point.
Of course it’s important to test the happy paths. Lots of badges have more than one of those. However we may not need to exhaustively test every combination – some of the tests go farther than necessary here. This is especially true when we can factor out and unit-test any tricky logic. And all this will be faster when the service is in its own file!
Do you have any stats we can use to identify services with high levels of usage but no test suite?
That’s a great idea! Unfortunately we don’t have that data, though I agree that anything with a lot of hits is worth doing first, and it would be useful to collect it. Probably some of them we could guess, but there will be some surprises. Let’s track in #966. This shouldn’t be terribly hard.
As an even cheaper + faster alternative, we could do a qualitative triage. This would also pick up badges which are important, even though they might not have a lot of hits, for example dub. It’s the package repository for D, which is a programming language that’s not in the top 10, so the badge is important even if it’s not getting a lot of hits.
I went through the unchecked list and identified 16 that seem critical (or 18, depending how you count). On the checked list, 23 are critical. That puts us at 56%, which is impressive, since the first was merged less than eight months ago! These have been written by many different contributors, which is a testament to the community's dedication! 💛
Perhaps someone else would like to take a pass through as well. We could combine our lists and end up with a few more. I could go through and update your post with my picks, though I wonder, would it be better to maintain this in a Google Sheet that will count for us?
Ooh, or better yet, could it be automated? 😁
Yeah a google doc might be the right place to track this given the volume of services. It will be easier for a group of contributors to collaborate on the list there. People can also put their username next to a service to indicate they are working on it rather than having to raise an issue for each one.
Given you don't have the stats to quantify which badges get the highest usage, I'm sure if a few contributors weigh in we can carve out a sensible block to tackle first. Seems like a good first step is to put that table in a google doc and add your thoughts on which are important services to it.
Yea, agreed the doc is a nice way to keep track of who is working on what.
Though, I think it'd be fun to automate this. Why don't I see if I can do that real quick, and if that doesn't pan out, then let's do the google doc?
I opened a PR #1386 with my subjective list of critical tests. I'm having a lot of fun making a line chart that shows our test coverage over time, so I'm going to keep at that.
If anyone would like to add their own subjective list, please do!
For reference, here's the list of critical untested services pulled from @chris48s's chart above:
service | has tests |
---|---|
cdnjs | |
chocolatey | |
circleci | |
clojars | |
cocoapods | |
codeship | |
david | |
docker | |
dub | |
gem | |
hackage | |
homebrew | |
itunes | |
jira | |
myget | |
nuget | |
✔️ #1394 | |
uptimerobot |
gradually making progress through the critical services..
service | has tests |
---|---|
cdnjs | :heavy_check_mark: |
chocolatey | :heavy_check_mark: |
circleci | :heavy_check_mark: |
clojars | :heavy_check_mark: |
cocoapods | :heavy_check_mark: |
codeship | :heavy_check_mark: |
david | :heavy_check_mark: |
docker | :heavy_check_mark: |
dub | :heavy_check_mark: |
gem | :heavy_check_mark: |
hackage | :heavy_check_mark: |
homebrew | :heavy_check_mark: |
itunes | :heavy_check_mark: |
jira | :heavy_check_mark: |
myget | :heavy_check_mark: |
nuget | :heavy_check_mark: |
:heavy_check_mark: | |
uptimerobot | :heavy_check_mark: |
You're making amazing progress on these @chris48s!
Wow, are we really down to four? That is awesome! 😄
I've just merged the service refactor! #1543
There's some housekeeping and testing todos which carry over:
services/appveyor/appveyor.js
(exports one or more services)services/appveyor/appveyor.tester.js
(exports a ServiceTester)services/appveyor/appveyor.spec.js
(describe()
's and it()
's for the helper functions, if any)check-node-version
flags #1564This coinciding with tests of the the second to last critical service!
And we can start rewriting the services! My suggestion would be to approach these gradually, taking one or two at a time, adding helpers and refactoring as we go. We could start with the critical list, I suppose. Once we have a few examples we're happy with, we could pick up speed.
It'd be great to enforce 100% code coverage on the new files. I wonder if there would be an easy way to instrument that!
I'd also like to move slowly so we can get this deployed before we're too far down this path, just in case there are any performance-related issues with the new code.
I've just merged the service refactor!
:champagne: :100: :boom: :rocket: :tada:
My suggestion would be to approach these gradually, taking one or two at a time, adding helpers and refactoring as we go I'd also like to move slowly so we can get this deployed before we're too far down this path, just in case there are any performance-related issues with the new code.
:+1:
I'm genuinely excited to start work on refactoring :sparkles: , but it is worth thinking a bit about how to fit some different types of code into the proposed structure first. One of the advantages of having a large body of existing legacy code is that we already have an example of pretty much every possible edge case we will need to accommodate. :cloud: Silver linings :)
Obviously in a case where we have one badge for one service, that is very clear cut.
The other pattern we see everywhere is where there are lots of badges for one service, and within that two patterns:
A single 'service' should be able to group multiple badges (so pypi.js
can hold pypi version, pypi license, pypi downloads). My instinct here is that as we refactor, we mostly should aim to break down single functions that generate many badges (e.g: version, license, downloads) into one function per badge (although minor variations like dt
, dw
, dm
in one function is sensible). If there is repetition to reduce, it is probably better to declare a shared helper function than to have one service object which does lots of things.
There are probably 3 categories of 'helper' or 'library' functions we need to think about:
prettyBytes()
, metric()
or starRating()
.sortDjangoVersions()
or getVscodeStatistic()
.getPhpReleases()
(used by travis PHP version and php-eye) is an example of this. There are probably a fairly small number of these.Based on your example, I assume the pattern to follow here is that helpers specific to one service should move into a module with the relevant badge (so something like getVscodeStatistic()
moves into services/vscode/vscode.js
and the tests for it move to services/vscode/vscode.spec.js
) and then everything else stays in /lib
. However, it seems that in order to follow that pattern, vscode.js
would need to export the helpers so that vscode.spec.js
can test them. Would it be better for the helpers to also live in a seperate file (e.g: vscode.helpers.js
) so that vscode.js
only exports service objects?
Another pattern we have in a few places is we have badges where multiple service badges share code because they query the same API (I'm thinking of Chocolatey/Powershell Gallery and MyGet/NuGet here because I've worked on them recently but maybe there are others). In terms of mapping that to the new structure, I'm not sure whether you would say:
services/nugetv2/nugetv2.js
is a service which exports service objects for Chocolatey version, Chocolatey downloads, Powershell Gallery version, Powershell Gallery downloads (and so on) or services/chocolatey/chocolatey.js
is a service which exports service objects for version, downloads, etc and then independently services/powershellgallery/powershellgallery.js
is also a service which exports service objects for version, downloads, etcThis case is relatively uncommon, but probably worth thinking about.
Another thing that is probably worth thinking about at an early stage is error handling. I know we have https://github.com/badges/shields/blob/master/lib/error-helper.js but I think there probably scope to implement that standard logic as a method either in BaseService
or the code that calls handle()
such that the behaviour can overridden for services where something custom is needed. I'll try and think of some examples of services with non-standard error conditions which might be good to test the flexibility of that approach.
A single 'service' should be able to group multiple badges (so
pypi.js
can hold pypi version, pypi license, pypi downloads). My instinct here is that as we refactor, we mostly should aim to break down single functions that generate many badges (e.g: version, license, downloads) into one function per badge (although minor variations likedt
,dw
,dm
in one function is sensible). If there is repetition to reduce, it is probably better to declare a shared helper function than to have one service object which does lots of things.
:100: :+1:
However, it seems that in order to follow that pattern,
vscode.js
would need to export the helpers so thatvscode.spec.js
can test them.
They could also be static methods on the service class. In case of simple services with one or two domain-specific helpers, I think that could be a great pattern to follow. In that case, they are available through the exported services.
Would it be better for the helpers to also live in a seperate file (e.g:
vscode.helpers.js
) so thatvscode.js
only exports service objects?
100% cool with that as well! Or perhaps instead of helpers, it could be something specifically scoped like vscode-api.js
. (In general I think descriptive module names beat "utils" or "helpers" modules… though there are exceptions to every rule.)
I'm not sure whether you would say:
services/nugetv2/nugetv2.js
is a service which exports service objects for Chocolatey version, Chocolatey downloads, Powershell Gallery version, Powershell Gallery downloads (and so on) orservices/chocolatey/chocolatey.js
is a service which exports service objects for version, downloads, etc and then independentlyservices/powershellgallery/powershellgallery.js
is also a service which exports service objects for version, downloads, etc
I'd tend to prefer the second. It may also be a good case for creating lib/nuget-api.js
which is then imported by the services which need it.
If that causes too much repetition, another pattern we could consider is the constructor factory. That would mean creating something like lib/nuget-service-factory.js
, and invoking it from each of the three service modules to create the individual service classes.
Another thing that is probably worth thinking about at an early stage is error handling. I know we have https://github.com/badges/shields/blob/master/lib/error-helper.js but I think there probably scope to implement that standard logic as a method either in
BaseService
or the code that callshandle()
such that the behaviour can overridden for services where something custom is needed. I'll try and think of some examples of services with non-standard error conditions which might be good to test the flexibility of that approach.
Yes! Totally agreed. This is going to be the biggest win from this rewrite.
In addition to the 404 errors and the kind of errors checkErrorResponse
is handling, we should think about response schema validation. The most important reason to do this is that it will let us completely distinguish between a programmer error in Shields (i.e. a 500 error) and an error caused by the service (i.e. a 503).
In https://github.com/badges/shields/pull/963#issuecomment-333336383 I proposed a Vendor
abstraction. Maybe ExternalAPI
would be a better name for it. I like the idea of keeping BaseService relatively light and concerns separated, and a wrapper around the external API would provide a nice place to put the logic you're talking about. This abstraction for external services could also help with amping up our caching which is something I'd like to tackle if we have a chance.
I was playing around with a WIP branch that refactored one service, focusing not so much on error throwing (i.e. new versions of error-helper), but error catching in BaseService. Specifically, it supports disabling handling of internal errors during development. This makes for informative stack traces, massively easier debugging, and much more delightful coding of services. Here's the branch where I started playing with it: https://github.com/badges/shields/compare/master...paulmelnikow:rewrite-npm-typedefs
@chris48s and I had some good discussion in #1735 #1743, worth reading for anyone else who is following this work.
I'm starting a new list here of the rewritten services, where we can claim what we're working on so we don't duplicate effort. The ones with :heavy_check_mark: need to be updated in light of #1735.
Update: List moved to https://docs.google.com/spreadsheets/d/1cHIUSVaiKrIFw3KIu0yt-EMNlMkIfU5alE7YKZ4PeOE/edit#gid=0
The ones with mag need to be updated in light of #1735
Just for others reference, the relevant issue here is #1743 not #1735
Can all the 🔍be changed to ✔️at this point?
Yep - now #1883 and #1901 are merged they are all up-to-date
Updated @chris48s's messages regarding which services had tests, and which didn't. We're making good progress on service coverage. 👍
There will be a few services which we've added or deprecated since I made that table, but there's a lot of green ticks in that table now :D
October 10:
Today:
😲
If anyone is interested in helping us to port a legacy service, services which need refactoring can be found with this search:
https://github.com/badges/shields/search?l=JavaScript&q=extends+LegacyService
Tutorial on porting legacy services is at
https://github.com/badges/shields/blob/master/doc/rewriting-services.md
I've recently ported a couple services, and I'm now wondering how I should mention/communicate when I start working on a service to avoid any potential duplication of efforts (however remote that possibility may be).
Should folks (that can't edit the comment with the main list) just add a shout on this thread (something along the lines of I'll port service xyz
)?
Some updated services since the last list
nsp
- deprecatedsourcegraph
- migratednexus
- migrated, pending in #2520 jira
- migrated, pending in #2541 imagelayers
- deprecatedlibscore
- deprecatedYeah good point. We've been a bit rubbish at keeping this up-to-date lately and I think there is too much on the list to raise an individual issue for every service that still needs migrating.
I've set up a spreadsheet here: https://docs.google.com/spreadsheets/d/1cHIUSVaiKrIFw3KIu0yt-EMNlMkIfU5alE7YKZ4PeOE that we can use to keep track of it. I've just made it world+dog-writable. If you're working on something, stick a note in there and then we can make sure we're not tripping over each other :) Feel free to update as stuff gets merged or whatever.
I think this level of granularity is fine, but we might need to break GitHub out as it is such a large service family.
Thanks for setting that up!
I was talking with @calebcartwright today and thought of some uses for the search badges:
It also occurs to me, just now, that perhaps they are slow on the linux repo…
It's amazing progress we're making through https://docs.google.com/spreadsheets/d/1cHIUSVaiKrIFw3KIu0yt-EMNlMkIfU5alE7YKZ4PeOE/edit#gid=0 !
I pinned this issue since it's some of the most accessible work on the roadmap and it would be great to lighten the load with some extra hands!
I wonder if it would be helpful to reopen this issue in a way that is more welcoming to new contributors, and referencing our tips for rewriting services.
Also in the spreadsheet, I wonder if we should break out the GitHub services to encourage everyone to continue working on them.
I wonder if it would be helpful to reopen this issue in a way that is more welcoming to new contributors
:+1: Lets open a new issue summarising in the top post where we are with this job right now and pin that. A year's worth of GitHub archaeology is a reminder of how far we've come but it doesn't explain the task well for newcomers
Would either of you like to do that? @chris48s @calebcartwright
Yeah I'll take a crack at it later tonight soon
Also in the spreadsheet, I wonder if we should break out the GitHub services to encourage everyone to continue working on them.
Forgot to respond to that part, but agreed on that for GitHub and I think it's probably a good idea for any other integrations that have multiple services in need of refactoring. If someone wants to take on the refactor for all the classes for a service/integration, then they can ✋ for each, but folks should also be able to tackle a single one if they so choose and that'll be easier with them broken out IMO
Wonderful post!! Only thing I’d add is: where should people post if they need help? I usually recommend “the thread in which the request occurred” or discord.
So we could either invite questions in the refactor thread, or welcome them to open a new “Refactor x service” issues, explaining that we’re not pre-opening those issues to avoid a huge pile of empty issues.
and you can reach out to us on this thread and/or Discord with any questions!
So ^ was the last line in there. I think Discord is always a valid option (and folks definitely use it!), the question would be where in GH we'd recommend folks ask questions.
One thing that made me pause with my original line was that the refactor thread could end up getting pretty long again, even if there's only 1 question and 1 reply for the remaining 75 services. Having them open up a new issue is an intriguing idea
We could create an issue template for service refactor questions (or perhaps just use the existing question issue template), and link to it directly from the post
I've pinned #2863
Forgot to respond to that part, but agreed on that for GitHub and I think it's probably a good idea for any other integrations that have multiple services in need of refactoring. If someone wants to take on the refactor for all the classes for a service/integration, then they can hand for each, but folks should also be able to tackle a single one if they so choose and that'll be easier with them broken out IMO
Hmm.. so I think there's 2 cases to consider here:
class MyService extends LegacyService
but inside that, the code for registerLegacyRouteHandler()
contains a big ol' case statement and can generate a version badge and a licence badge and a downloads badge, or something.I think in the first case, it does make sense to tackle them in bits. If you want to split them out in the spreadsheet, that would be helpful :)
Where the second pattern exists, in most cases its probably not terribly useful to try to extract just the version badge code from the legacy service but keep the legacy downloads and licence code working. I think in those cases, I'd prefer to consider refactoring that as a single unit of work, even though it is multiple badges.
Does that distinction seem reasonable?
Does that distinction seem reasonable?
Yup. I was thinking about the first case, but agreed that the second case makes sense to keep together. I'll go through the remaining to-be-refactored services in the spreadsheet and split the ones in the first case (split into multiple files/classes) into separate line items
and you can reach out to us on this thread and/or Discord with any questions!
So ^ was the last line in there.
Aha! Sorry I missed it. I edited slightly to give it more prominence.
We could create an issue template for service refactor questions (or perhaps just use the existing question issue template), and link to it directly from the post
Either of those sound like great ideas. I don't think it's a bad idea to make an issue template for refactoring. This project is big enough that it's likely to have major projects we need to track, and I haven't given up yet on spreading the work widely. 😁I think getting a pattern in place for communication that reduces friction to getting started could be a good investment in the future.
Just an FYI that I've finished splitting out the handful of LegacyServices that were in their own respective files (basically was just GitHub plus a couple others) into their own line items in the spreadsheet.
Honestly though, I doubt outside the GitHub services that anyone will refactor any of those services in isolation to your point @chris48s, but they're out there now anyway!
Now that #963 is merged it’s time to start thinking about 🏆 The Great Service Rewrite! 🏆
The rewrite has three goals:
server.js
andlib/all-badge-examples.js
into separate modulesThe service rewrite necessitates upgrading to Node 8 on the servers. Today they're running Node 6. I’m not sure exactly how long it will take to do that. I don’t have access to upgrade Node, though I’ve reached out to @espadrine about it.
963 which is now merged to the node-8 branch included a new implementation of the AppVeyor badge. It’s about to fall out of date with the merge of #1321. I’m not concerned about AppVeyor. After all it's just one badge that we could re-port at any point.
However, we should decide on a short-term approach for how to handle badge changes.
Some ideas:
node-8
branch, using the new service code. Put a temporary freeze on new badges inmaster
.node-8
branch. Except for security fixes, freeze their implementations and tests inmaster
.master
and then merge intonode-8
.We could do 1 + 2, or 1 + 3, or 1 + 2 + 3, or 1 + 2 + 3 + 4. None of these are mutually exclusive.
I created a project to track this, though it doesn't have anything useful in it yet.
Thoughts?
Let's get the ball rolling! 🎳