Youtube badge fails intermittently

Jetpack-Cat commented 2 years ago

Are you experiencing an issue with...

shields.io

🐞 Description

My youtube badge suddenly says invalid, it appears youtube changed something. Making a new one still shows invalid. It may have something to do with the new youtube handles for channels. I believe it was working before I accepted the @handle youtube suggested

🔗 Link to the badge

You'll see it https://www.curseforge.com/minecraft/modpacks/jetpack-cat

💡 Possible Solution

I believe youtube changed something and made this stop working, because nothing has changed on that page for months and suddenly it's invalid

PaulaBarszcz commented 2 years ago

I cannot see the yt badges on the link that you provided (https://www.curseforge.com/minecraft/modpacks/jetpack-cat), you probably removed them if they stopped working :P

Which badge(s) were you using for youtube, that are not working now? https://shields.io/

DenverCoder1 commented 2 years ago

I cannot see the yt badges on the link that you provided

They have this badge still at that link

It was showing all grey and as "invalid" yesterday but it appears to be working now.

Same happened with the ones on my profile.

chris48s commented 1 year ago

I wasn't able to catch this and see what the API response from youtibe was when it wasn't working. I'm going to close this for now on the assumption it was a transient issue but if it happens again, re-open. Cheers.

DenverCoder1 commented 1 year ago

It seems like the issue is back now.

Jetpack-Cat commented 1 year ago

It happens intermittently.

DenverCoder1 commented 1 year ago

Yeah, seems to be working again now

chris48s commented 1 year ago

https://github.com/badges/shields/issues/8758

programmingwithalex commented 1 year ago

I am encountering the same issue showing invalid. I get the invalid response on https://shields.io/category/social and on my GitHub profile page.

kassbohm commented 1 year ago

Same issue here. Using https://shields.io/category/social leads to invalid for any youtube video I tested...

benderillo commented 1 year ago

Youtube subscribers badge does not work tried with a couple of channels, always "Invalid"

Screen Shot 2023-02-24 at 2 09 31 PM

Here is one for example https://img.shields.io/youtube/channel/subscribers/UCXXkKBqXrBN4AD2HBuGc9Rg?style=social

DenverCoder1 commented 1 year ago

It seems to work when running locally, but failing sometimes in production.

Could it possibly be a rate-limiting issue? Perhaps the quotas need to be increased?

calebcartwright commented 1 year ago

This is almost certainly a recurrence of the rate limiting issues we've had in the past with YT. I believe our current quota is still 50k daily requests based on the last time we went through this (https://github.com/badges/shields/issues/6276#issuecomment-812110113).

I suppose we could request another increase, though it was quite a pain last time and I'm not sure if we've the necessary bandwidth at the moment to handle a similar level of engagement with YT/Google. In the short term we might want to try increasing how long we cache these badges to hopefully alleviate some of the upstream api hits

adamlui commented 1 year ago

I think if logic were added to hide the balloon outright vs. showing 'invalid' when requests fail would make it look better in a cluster of shields (like the Twitter one)

adamlui commented 1 year ago

look better even alone*

adamlui commented 1 year ago

In @chris48s's attempt to alleviate this issue in https://github.com/badges/shields/commit/bf1bea8b4a106227cd81a9c5ac88e0266cf9c2c2, I proposed a great solution that @calebcartwright asked me to move any additional commentary elsewhere to a relevant issue, so I am replying here instead

The cache value seen/discussed here [in https://github.com/badges/shields/commit/bf1bea8b4a106227cd81a9c5ac88e0266cf9c2c2] is what we set on the response when we return the badge to the requestor, and that cache value is in turn used by the various downstream actors (e.g. your web browser, GitHub's Camo, CDN, etc.). When we increase that value, it results in fewer badge requests getting to our server, and accordingly fewer API requests our server makes to those upstream data sources.

This cache value simply being higher does not solve the problem of badges showing 'invalid' when 50k API requests from shields.io are sent upstream for the day.

My solution to this specific problem is, even if upstream response is invalid, just show '?' to make it less ugly, and mitigate hitting this daily upstream limit further by not just increasing downstream cache duration, but actually limiting (via hard code) a requestor's request frequency to once per 24h (for example, by adding a timestamp of last request, then comparing w/ Date.now(), then if delta is <24h, show the old cached response because numbers are still less ugly than '?' no matter how old`).

The downstream cache duration value simply being higher (while reducing upstream requests) does not control individual requestors' request frequency, which doesn't even need to be high to serve the purpose of this badge (once per 24h will still result in the same final count as 1000 times in the final 10 mins) so it's a waste to not control it if upstream requests can be limited further if done.

And combined with tracking total upstream requests (keep a count that resets at PST midnight so a condition can be added if (cnt > 50000) to show '?' or old number) and the 'invalid' issue will never occur again!

calebcartwright commented 1 year ago

@adamlui - what you are suggesting regarding individual limits is simply not possible for many reasons, most notably because the majority of the badges we provide are rendered on GitHub, and I would again encourage you to read the GitHub documentation I previously posted as it pretty clearly articulates why that's not possible - https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/about-anonymized-urls.

It's easy to prescribe solutions when you don't think about any contexts or constraints, but there's quite a bit of those here that you seem to be overlooking.

I appreciate your desire to have these badges working, but unfortunately your suggestion is not a "great" one. Technically we could change the message to ?, but that seems pretty pointless and a subjective preference. There's nothing inherent to ? that would indicate to users (and subsequently to us) that there's some sort of issue with the responses being received by the upstream provider.

If you feel that strongly about wanting a ? vs invalid on the badge message, then your best bet is to consider using our Custom Endpoint badge where you could easily implement such a transformation process on the message

adamlui commented 1 year ago

appreciate your desire to have these badges working, but unfortunately your suggestion is not a "great" one. Technically we could change the message to ?, but that seems pretty pointless and a subjective preference.

@calebcartwright Why is it pointless to make a counter shield not jarringly show a word in a cluster of shields that show numbers?

There's nothing inherent to ? that would indicate to users (and subsequently to us) that there's some sort of issue with the responses being received by the upstream provider.

So what? The issue is it's ugly, defeating the purpose of shields, to aesthetically brag. Why would the viewer need to know there's some sort of issue to this end? And why are you insistent on making sure the issue is known to be upstream? (The viewer doesn't care where the issue originates)

adamlui commented 1 year ago

It's easy to prescribe solutions when you don't think about any contexts or constraints, but there's quite a bit of those here that you seem to be overlooking.

@calebcartwright I didn't say it was hard, I merely provide it for anyone (not even for you who has made it clear you don't find it possible) who wants to make a pull request

calebcartwright commented 1 year ago

There's an adage that basically goes "first seek to understand, then to be understood". So far I unfortunately don't feel like you've engaged in any attempts to understand the relevant constraints and their applicability in the context of your proposal, and instead seem content to simply repeat what you think should be done.

As such I don't intend to continue engaging in such back and forth and will simply suggest that you open a PR given your confidence in the feasibility of your proposal.

@calebcartwright Why is it pointless to make a counter shield not jarringly show a word in a cluster of shields that show numbers?

This thread is not a question about one error message vs. another, it's about trying to get the badges actually working. As such the message of ? vs invalid vs foo vs bar vs ... is completely orthogonal.

adamlui commented 1 year ago

There's an adage that basically goes "first seek to understand, then to be understood". So far I unfortunately don't feel like you've engaged in any attempts to understand, and instead seem content to simply repeat what you think should be done.

@calebcartwright I understand you feel there are constraints that prevent my solution (that wasn't even directed at you, but open source contributors) from being implemented. You don't understand there are smarter people than you and I that work beyond constraints.

As such I don't intend to continue engaging in such back and forth and will simply suggest that you open a PR given your confidence in the feasibility of your proposal.

You tagged me first, I simply reply to your replies to me. Again, I repeat, my contribution is the idea for others (not me) to open a PR. You are lacking in attention to detail of the intent of my proposal ("I merely provide it for anyone who wants to make a pull request.") Also your engagement with me is not a requirement for anyone to open PR's, so please do disengage

This thread is not a question about one error message vs. another, it's about trying to get the badges actually working. As such the message of ? vs invalid vs foo vs bar vs ... is completely orthogonal.

Indeed the issue is the badge doesn't work, and my proposed solution would make it work more often, with the added benefit of fixing shields not working as intended (brag in an aesthetically pleasing manner) even when upstream response is invalid. You may not like it, but I remind you, my solution is for any others capable of working beyond constraints

calebcartwright commented 1 year ago

@adamlui - You're now starting to drift a bit too close to personal insults which is both off topic and not something we tolerate in our community. As such I'm going to hide these last couple comments in an attempt to reduce noise on the thread and keep it more focused on the issue at hand.

Thanks again for sharing your perspective, and again, if you feel confident that you're right and the project maintainers are wrong, then please feel free to articulate a concrete proposal or PR

adamlui commented 1 year ago

@calebcartwright

You're now starting to drift a bit too close to personal insults

What insult do you refer to? Also why are you allowed to continually ridicule the comprehension levels of contributors, but can't stand the heat when they point out you missed the detail of the intent of an idea?

As such I'm going to hide these last couple comments in an attempt to reduce noise on the thread and keep it more focused on the issue at hand

As long as you don't hide the idea then you are not hindering the project's development

Thanks again for sharing your perspective, and again, if you feel confident that you're right and the project maintainers are wrong, then please feel free to articulate a concrete proposal or PR

This is not an ego war, you don't have to be wrong for an idea to be implemented successfully despite contraints by people who are smarter than you (yes, they exist). And for the third time, the idea is for someone smarter than us to implement, not sure why you keep granting me permission to make PR

calebcartwright commented 1 year ago

Going to lock this for some period of time since it appears to be getting a tad charged and likely spamming those users who are subscribed to the issue in order to receive updates.

Will unlock it at some point down the road and as always we'll post any updates as and when they become available

chris48s commented 1 year ago

I've been AFK for a couple of days. Coming back and seeing my notifications, I think it would be fair to summarise and say: There has been some activity in this issue thread and in https://github.com/badges/shields/commit/bf1bea8b4a106227cd81a9c5ac88e0266cf9c2c2

I acknowledge I've seen all of the notifications. I will read all this backlog and write some sort of reply, but not immediately.

In the meantime, I will say this: Lets calm down. It is a badge in a README. Nobody died.

chris48s commented 1 year ago

First of all, everyone who works on shields.io is a volunteer doing this in their spare time. Nobody is getting paid for this, nobody is "on call". We've all got lives. Please respect that. Tbh, writing all this wasn't really what I wanted to spend my evening on. I had a bit of time free to work on the project tonight. My preference would have been to spend it on following up the latest review comments on https://github.com/badges/shields/pull/9014 (a long-awaited and high impact change for the project), but here we all are..

Reading over this, it seems like there are really 4 distinct topics to address.

Can we do more to fix the youtube badge?
Suggestions about keeping track of the number of requests we're making to upstream services or caching responses from upstream services
Suggestions about changing the way we present error messages
Has anyone violated our code of conduct?

Lets start off with

1. Can we do more to fix the youtube badge?

Yes. In the short term. I've just merged https://github.com/badges/shields/pull/9250 increasing the max age on the badges a bit more. I will deploy it shortly. Lets see if that gets us under 50k requests a day. Its not a permanent solution, but it is something I can do now that will hopefully improve things.

Next time Caleb and I have a voice call, next steps on this issue will be one of the things on the agenda.

2. Suggestions about keeping track of the number of requests we're making to upstream services or caching responses from upstream services

This is something that comes up from time to time.

Someone will suggest we do something like

keep a count that resets at PST midnight so a condition can be added if (cnt > 50000)

This seems simple on the surface, and it works just fine if your application runs on a single server/container.

Shields currently runs on a minimum of 14 containers (at the weekend) and maximum of 22 containers (during working hours for Europe/America on weekdays). So to implement something like that, you need some kind of shared state (a database or cache that all the the containers and read from/write to) to store that count.

Shields.io has to serve a lot of traffic on a very tight budget (donations) with only volunteer labour to maintain it. Currently, one of the ways we achieve this is we try to avoid dealing with any shared state between instances and rely entirely on downstream cache (CDN) of the SVG responses to reduce traffic to upstream APIs. There are several reasons for this choice, including:

Scalability: As soon as you have any sort of shared DB or cache, you've got a bottleneck that limts your ability to scale by just adding compute resources. At the moment we can basically just keep scaling out by adding containers to the cluster.
Cost: Compute resources are relatively cheap. Managed storage resources are much more expensive. For downstream cache, we can benefit from free CDN services from CloudFlare.
Simplicity: Our current setup is about as low maintenance as we can possibly make it.

So people sometimes suggest we cache API responses upstream, or track our rate limit usage, and it can seem simple to suggest if (cnt > 50000) but actually in order to implement something like that, we (the core team) would need to take on the financial cost, maintenance overhead, and scalability tradeoffs of shared state in order to do it. So those are our constraints (in some ways self-imposed). Simultaneously, it is a tradeoff which has allowed a small group of volunteers to keep a really widely used service running for ~10 years.

All of that said: I don't completely rule out ever adopting any more persistent state, but we do need to be careful, deliberate, and make sure it is something that is going to solve problems across many of our (~600) service integrations. It is not something we would lightly take on to solve one issue with one service (in this case, youtube).

3. Suggestions about changing the way we present error messages

The way we present errors is definitely something we think about. A good example where we recently considered this is https://github.com/badges/shields/pull/9159

In general:

We don't keep any logs, at all. This is partly because we don't track our users, and that extends to logging. It is also partly because at the scale we generate logs, the cost of having some upstream service ingest them is very high.
Unhandled errors are rare. We log the stacktraces to Sentry, but we strip the console and http context.
If we generate a handled error (by far the most common case), we don't really have a better place to expose information about what happened than on the badge itself. Even if the error is not necessarily actionable to the person who views it, at least someone can report it and (hopefully) tell us something that helps us work out what happened and fixed it.

So to work through an example, if a working badge says

or

our error badge might say

Would it be better to render ? Not really Would it be better to render ? Maybe. Arguably we're at least not exposing an error that might not be actionable, but at least is more likely to result in us getting an actionable bug report if an issue is intermittent or difficult to reproduce locally.

Maybe that is not to everyone's taste, but that's the constraints we're working with (again, some are self-imposed) and the choices we've made when it comes to error messages.

In the case of the youtube badge, we don't render a particularly useful error message. One of the issues here is that if the issue is related to rate limit, I'd expect us to be getting a 429 response status code, which would cause us to render , but we're getting back.. something else that we don't know what to do with and rendering the most generic error message. This is one of the reasons I didn't originally identify this as a rate limit issue.

4. Has anyone violated our code of conduct?

I'm not one of the CoC enforcement people, but in this situation, I think I'm probably the best mediator immediately available.

Having read over the conversation, there is nothing here that I would describe as "harassment" or "abuse". It is a conversation that descended into what I would describe as "unprofessional squabbling".

@adamlui You care about this issue. We get it, but alo repeat posting and @mentioning on the same topic isn't helping or increasing our appetite/ability to spend time on this.

It is entirely legitimate for Caleb (one of the project's core team) to jump in and reply to a thread.

I appreciate you aren't getting the answers you want to hear, but I think you would have been wise to follow Caleb's advice when he suggested "It's probably best to take a day or two to reflect and cool down" rather than escalating. One of the things I've consciously done is slow down the pace of this whole conversation. Cooler heads usually prevail.

@calebcartwright I think your first post https://github.com/badges/shields/commit/bf1bea8b4a106227cd81a9c5ac88e0266cf9c2c2#commitcomment-117071843 explains things pretty well, but I do also see why @adamlui feels like you were being dismissive and talking about "constraints" and "context" without really explaining it in later posts.

I've taken the time myself to really lay everything out above. Hopefully this helps. Also I can see why you didn't. It took a long time. The conversation was already heated by that point.

Next Steps

I've locked the conversation on commit https://github.com/badges/shields/commit/bf1bea8b4a106227cd81a9c5ac88e0266cf9c2c2 - As Caleb (correctly) says, its not an ideal place to carry on the conversation.

I've left one post visible containing the points I've followed up on in this post. I have also hidden most of the posts on this issue thread from both authors. There is a lot here that is not adding much value in the grand scheme of things IMO.

I'm going to unlock this issue thread.

Lets keep any further conversation on this topic constructive. We'll lock it again if we can't.

I think (hope) there was value in explaining things a bit more clearly above, but I don't ideally want to continue exchanging giant walls of text on this. As I've said, it took a long time to read all of this, think about it, and type all that out. It wasn't really what I would have ideally chosen to spend an evening on.

Maybe one thing this can lead into is better documentation on some of this stuff.

adamlui commented 1 year ago

@chris48s Your response was exactly as I expected, covering the abusive demeanor of a peer you voice call with (so clearly share personality traits to not find it alarming) as merely "dismissive" instead of abusive. You are not impartial. Your actions even betray your excusal of his abuse (hiding all his posts)

But I don't actually seek validation from you, I tagged maintainers out of curiosity if this partiality would be as strongly embedded among peers as i suspected it would be (so far so true)

But what's most revealing is your comment "It's just a badge for readme's." That + your comment about being a volunteer (as if justifying why violating Code of Conduct is OK despite all volunteers being equal). It truly shows you don't care about a project that's not just for readme's. People use shields for entire websites...

...while you treat it as some trivial readme feature. It really explains everything. Such a shame this project is being maintained by people who care so little, even creating hostile environments by insulting/censoring/threatening holders of respectfully different opinion against Code. You literally made me care less, too (indifference is infectious). I truly & fully understand why things are so broken now; maintainers are the problem, not a lack of ideas or effort from others who want to contribute but are put off. The careless attitudes & mindsets of two of you (so far) have exposed why.

adamlui commented 1 year ago

@chris48s You should really reflect on what you said: "It is entirely legitimate for Caleb (one of the project's core team) to jump in and reply to a thread." You conveniently ignore what your buddy jumped in to do: insult/censor/threaten holder of respectfully different opinion (especially heinous as "core team")

If you don't see how biased it makes you sound, or alarming it is to the health of a project's open development for "core team" to do this, then the problem is deeper than I imagined (and we can't have any meaningful discussion henceforth due to this blatant bias)

adamlui commented 1 year ago

@chris48s but I am still grateful for having met you because your avatar made me binge on Breaking Bad during meal breaks again 👍

adamlui commented 1 year ago

@chris48s also I am expert at reading people (I played poker professionally for decades) I am nearly certain you derived enjoyment (based on a specific comment you made, before your pc-mode essay) ;)

calebcartwright commented 1 year ago

Locking this again

PyvesB commented 1 year ago

We're indeed reaching our daily API quota: Errors

Said quota is currently set to 50000 requests per day (#6276), but we have been trying to render close to 100000 daily badges in recent weeks prior to the cache changes.

My past self helpfully left a link to the form (https://github.com/badges/shields/issues/5101#issuecomment-630330420), so I simply went ahead and asked for a new quota increase to 200000. This should give us some comfortable headroom.

I'll keep you posted when I hear back from the Youtube team.

chris48s commented 1 year ago

Wow. Thanks for appearing and taking that on! I had got as far as finding the form and realised I didn't know all of the information to complete it. Also my memory of this is that last time we requested an increased rate limit there were some issues raised in the application compliance audit. Hopefully it is smooth sailing this time :ship:

In the meantime, based on the grafana dashboards, I think the second PR to increase the maxAge has probably got us under 50k requests per day :crossed_fingers:

chris48s commented 1 year ago

I'm closing this issue off. Thanks to @PyvesB we have now negotiated a higher rate limit with Youtube and we're now running under it with some headroom, even having reduced the time we cache the badges for.

badges / shields