Open simoniz0r opened 6 years ago
@simoniz0r you're right, this might be an issue, I recently hit the rate limit myself a few times with an other application.
I'd rather consider using web scraping to work around these restrictions, though. GitHub doesn't need to track update checks, IMO. Also, people shouldn't have to login with GitHub just to check for updates...
May be we could use the developers github token ? But keeping it safe is another big deal... Webscraping is a good idea but it can put a lot of development overhead.
But keeping it safe is another big deal
How would you keep it "safe"? And, more importantly, imagine if 2500 users use that token to perform 1 update check, the rate limit is triggered...
I've personally found web scraping to be much, much less reliable than using GitHub's API. Authentication is pretty easy to do and completely mitigates this issue; no average user is going to hit the rate limit at 5000 checks per hour.
May be we could use the developers github token ? But keeping it safe is another big deal... Webscraping is a good idea but it can put a lot of development overhead.
That's not gonna work. The token will keep the rate limit with it, so all users who use AppImage update tools would only be able to do 5000 checks per hour total and not 5000 checks per hour individually as they would if they authenticated with their own credentials.
The first thing I'd do here is to make AppImageUpdate "aware" of the rate limit by evaluating API responses and checking whether the rate limit has been hit (showing an error with a message).
We could think about using some environment variable like GITHUB_TOKEN
to allow users to pass the token to the tool. Then we can think about further measures.
We could think about using some environment variable like
GITHUB_TOKEN
to allow users to pass the token to the tool. Then we can think about further measures.
This seriously makes the end-user suffer a lot. I guess web-scraping is the best choice here because github does not update the site very often.
For a end-user , He/She must create a github account and get a token just for checking updates ? I would rather download the new version directly from the github releases myself.
@antony-jr you got me wrong there, the requests are of course continued to be made anonymously first. The token will only be used once the rate limit is hit.
Anything else would break with my privacy philosophies. I am not convinced that a token that does indeed solve the issue is a good idea privacy wise, but I would agree to optionally offer such a solution for now, to help users affected by the bug right now.
Of course it doesn't make sense to always require a token. And even if that was a necessity, the UX of setting an environment variable is really... well, you can imagine. As said, we will need to think of alternative ways of asking users to log into GitHub when making update checks.
I'm sure @simoniz0r didn't suggest to require a GitHub token as well. I just imagine @simoniz0r has a lot of AppImages, and after a while, the rate limit is being hit, making future update checks impossible. That might happen even more often in the future, since more and more AppImages are being published every day. Therefore, it's good to discuss the issue now before the majority of users will hit this issue.
@TheAssassin Got it :+1:
Note : Even jspm uses the environmental variable to solve this issue.
the UX of setting an environment variable is really... well, you can imagine.
Some Desktop Environments actually allow you to set environment variables for the whole session in an easy to use GUI. LXQt, for example, has a very easy to use option to do this in their session manager tool.
It's really not the greatest solution, but scraping the raw HTML has been much less reliable for me compared to using their API.
It's not like I was keen on writing some screen scraping in C++... But on the other hand, imagine a user had 60 AppImages of which >= 50% uses GitHub (not that unlikely, really), being a non-dev, I don't really want to ask them to register with a specific service for update checks.
A much nicer alternative would be for AppImageHub to keep track of AppImage versions so that the user doesn't ever have to.
A much nicer alternative would be for AppImageHub to keep track of AppImage versions so that the user doesn't ever have to.
@simoniz0r Using version numbers is not a good idea and not all developers wants to register their appimage in the AppImageHub and keep on updating the version info.
@simoniz0r Using version numbers is not a good idea and not all developers wants to register their appimage in the AppImageHub and keep on updating the version info.
Y'all keep saying that, yet the alternatives to AppImages (snaps and flatpaks) have no problem doing this and are much nicer experience to keep up to date.
@antony-jr I'd like to add that implementing every kind of versioning that you could imagine is a lot of work.
Y'all keep saying that, yet the alternatives to AppImages (snaps and flatpaks) have no problem doing this and are much nicer experience to keep up to date.
That's not entirely correct, @simoniz0r. Last year, I've been to an open-source conference, where a snap core dev talked about these topics. We've had a long conversation about this topic. In the snap ecosystem, version numbers are merely considered "tags" of a file, they aren't used for anything internally.
Well, whatever mechanism they're using for tracking versions is honestly a much nicer experience than trying to keep AppImages up to date.
snaps are honestly my reference for how AppImages should work for the most part. Everything about creating and updating them (with the exception of snapd being required) is much nicer to use than AppImage tools. The main reason that I still prefer AppImages to snaps is that AppImages require no additional software to be used, but it would still be nice to see AppImage adopt some of the ways of handling things that have been working very well for snap.
@TheAssassin About keeping the GITHUB_TOKEN
Safe , Why can't we use uuid + Timestamp to create a key for a stronger encryption(AES256 or Blowfish) at build time,
So that we can encrypt the GITHUB_TOKEN for the user or a developer given one ?
You mean, to be able to embed a key upstream? That is not an encryption. It's an obfuscation. Anybody can just read the key from the binary. Also, that'll allow tracking of every request AppImageUpdate will ever make. And it isn't scalable, the more users use the key, the faster the rate limit will be hit.
Anybody can just read the key from the binary.
Thats relative. Without debugging information embeded it would be very hard to decompile a binary , that too a binary created with a high level language such as C++. But , as a developer we cannot have assumptions so lets agree that this is a bad idea but we could use 'Public key cryptography'. Anyways lets just forget about this.
And it isn't scalable, the more users use the key, the faster the rate limit will be hit.
I accept with that too..
So another idea hit me, Instead of scraping github , How about we commit the update information into the repo and so we can get the meta-data (.zsync file) with just raw requests without the help of Github API or any user tokens. Just like a dumb web-server.
Important: We should not commit the new binary in the repo , Instead we just need the .zsync file commited with the file header pointing to the new binary at the releases ( We can get the file location after the upload by using the developer's GITHUB_TOKEN on build time which does not run out ). This auto-commit process should take place after the upload of the binary.
So When AppImageUpdate hit a limit , It would just look for the metafile with raw request which could then help us to do the zsync algorithm.
@antony-jr I don't need to decompile the binary, there's a ton of other ways, e.g., getting a memory dump, etc. This is not secure. Period. But we don't want to embed such a key at all upstream. (And public key cryptography doesn't solve anything here either, you always need a key to decrypt a secret).
Committing reproducible files is a really bad idea in general. Please check the internet, there's tons of articles rejecting the idea.
Committing reproducible files is a really bad idea in general. Please check the internet, there's tons of articles rejecting the idea.
Can you give me pointers ? Why this is a bad idea even if this does not rise any security issues or rate limiting , I think I have to reiterate , We just have to upload the .zsync (metafile) to the repo and so we can get the metafile directly from the repo , just like we retrive a raw file from github.
https://raw.githubusercontent.com/antony-jr/AppImageUpdater/master/.img/poster.png
Are we discussing a theoretical problem here? I’ve never ran into any rate limits as a user, as they are per-IP, right?
Are we discussing a theoretical problem here? I’ve never ran into any rate limits as a user, as they are per-IP, right?
@probonopd Yes , rate limiting is a real problem. Its not a big problem for now since AppImages are less in count for now , But Imagine a ton of software's getting updated simultaneously , The rate limiting will activate and everything falls apart , And please note that there is a very good chance that a lot of AppImages might use github as the development platform.
If you want to produce this bug , Try @simoniz0r 's tool -> https://github.com/simoniz0r/spm or use the AppImageUpdate with a large amount of AppImages , Also its not rare that a user might want to update a ton of software.( Like we do now with a package manager).
@TheAssassin Another way to solve this is to use the dumb link to the zsync file (The first transport method from the specification).
I have ran into rate limit problems many times personally.
@probonopd obviously @simoniz0r hit the rate limit, so, not as theoretical as you might imagine. With app stores like the NX Software center, those checks are performed a lot more frequently, too. We need a fix in the near future.
@TheAssassin Remember the dumb link to a zsync file. In my observation , the normal zsync update method and github update method is similar but the wild card is the only difference , So with some luck we could transform github update information to a dumb zsync update method. So I think this has to get into the AppImage specification.
gh-releases-zsync|probono|AppImages|latest|Subsurface-*x86_64.AppImage.zsync
can be transformed into -> zsync|https://github.com/probono/AppImages/releases/download/latest/Subsurface-*x86_64.AppImage.zsync
But here we face the problem with the wild card and if so then , The only way is webscraping. ( To the best of my knowledge.)
@antony-jr check out what appimagetool does. I think it still uses this method. As you correctly recognized, this only works with files with static names, like appimagetool or linuxdeployqt.
@TheAssassin But what if we could do this for github update method as a fail-safe then this problem would never exist. Only if we could get the name of the zsync file. Any thing hits your mind on Wild-Cards and URL's ? If we could answer this question then this problem can be solved without any major modification to the code.
@antony-jr you can't solve anything with "auto commits". Most authors don't want such stuff. And a dependency on git isn't very nice either.
@antony-jr you can't solve anything with "auto commits". Most authors don't want such stuff. And a dependency on git isn't very nice either.
@TheAssassin Lets agree to disagree , This is the final solution I have...
github-releases-zsync
as the first stage.This is what I can think of (under my belt).
check if the github-releases-zsync uses a static filename without a wild-card.
I'd rather tell people to use the zsync method then. That's a lot easier for everybody.
I'd rather tell people to use the zsync method then. That's a lot easier for everybody.
Eh, not so easy for application devs. The argument/syntax for creating zsync info is not at all intuitive and not really even obvious that it exists at all. If y'all wanna go the zsync route for updates, you need to make every AppImage spit out one by default.
@simoniz0r how is it difficult? Would you like to elaborate? You basically copy-paste the URL of the .zsync
file on the page of the "continuous" tag for example on GitHub, prepend zsync|
, and you're done.
If you guys want a feature to check for wildcards, feel free to send a PR. But then please make sure to also check whether latest
is used, as it requires an API request as well.
check if the github-releases-zsync uses a static filename without a wild-card.
I'd rather tell people to use the zsync method then. That's a lot easier for everybody.
@TheAssassin This is a fail-safe for github-releases-zsync update method and the upload tool might use the github method or the user would use this method without the knowledge of rate limits and also for backward capability (Developers may be using the github method still now even if the filename is static) , So this filter will solve that error if occured. ( This is because transforming it to normal zsync method is less expensive than scraping the information from github ). As for scraping , From my experience , Github is very friendly to scrapers , On the other-hand it would be nice if we could integrate with github(But that would be more work so forget about that). If the user uses the normal version then the checks will not take place in the first place. This whole fail-safe is just to make the developers life easier and also making the update tool more robust to avoid issues like this in the future.
If y'all wanna go the zsync route for updates, you need to making every AppImage spit out one by default.
@simoniz0r zsync is far the best solution here ,Even the author(Colin Phipp) intended to use it on image files and AppImages are basically ISO9660 image files ( Which are mountable ).
And also @TheAssassin if you are busy (I've got a lot of free time), I can implement the above solution as a patch to AppImageUpdate in standard C++11. Lets see if this fixes some issues temporarily and then decide on a permanent solution later.
@antony-jr first of all AppImages are not ISO9660 files. They're squashfs images. Type 1 ones were ISO9660 images with an ELF header.
I am not saying I have a better solution, but my goal is to keep the AppImageUpdate code base maintainable, not adding "features" that will work for a minority of use cases only (I mean, really, who's using static filenames without version numbers?). Please, show me numbers how often that workaround would save you from running into issues. How many of your AppImages with GitHub-based update information would be usable with this? I expect something < 10%.
I think a token makes more sense in the short term. Then, we can add a "better" solution later.
I think a token makes more sense in the short term. Then, we can add a "better" solution later.
@TheAssassin Okay that solves the problem for now. See ya! I hope this conversation was not a waste of time , I think we must pick our brains on this issue.
Have a good day everyone!
how is it difficult? Would you like to elaborate? You basically copy-paste the URL of the .zsync file on the page of the "continuous" tag for example on GitHub, prepend zsync|, and you're done
Yeah... that's pretty cryptic lol
@simoniz0r I see your point. Please send a PR implementing an algorithm "recognizing" such update information, and generating static URLs that can be used to continue in the code.
@TheAssassin Just tested my new solution , can I send a PR ?
The Algorithm goes like this ( Atleast the code goes like this )....
Do the github-release-zsync method.
if failed due to rate limiting then move onto the first fail-safe.
FIRST FAIL-SAFE: Check if it is possible to convert the update information to a normal zsync method.( This is just for backward capability and I'm very sure that this will not hinder the code base. ) if it is possible then convert to normal zsync method and then continue the syncing.
if first fail-safe failed then move on to the second fail-safe.
SECOND FAIL-SAFE:
Convert the github update information like this , gh-releases-zsync|(username)|(repo)|(tag)|(string with wild card)
-> https://github.com/(username)/(repo)/releases/(tag)/
(purl).
A regex should be created with the (string with wild card) , That is every wild card has to be converted to this regex snippet -> .*
and every static string should be converted to this regex snippet -> ([static string])
, Thus a regex like this will be formed ( example ) -> (Subsurface-).*(-x86_64\.AppImage\.zsync)
(pregex)
Send a request to 'purl' and get the raw response then simply apply the processed regex 'pregex'.
Now take the result of the regex which should be the filename of the zsync file (zsyncfn)
Now from the processed data , transform into information , Like so...
ZSYNC_FILE_URL = purl + zsyncfn
Now with the ZSYNC_FILE_URL
we can continue the normal syncing.
Do critic any mistakes in the above procedure.(Algorithm ?)
I am still wondering why people might run into the rate limit. How often are you checking for updates? Shouldn't once per week and used(!) app be sufficient?
@probonopd Imagine 60 AppImage's which uses github as development platform(Which is likely) with Auto-Update enabled , The rate limit will activate if they just use the API (60 Times). (i.e) If they just check if they have the new version. I think AppImageUpdate should be robust and also note that a lot of package managers(Which are based of github) are using GITHUB_TOKEN
to solve this issue. Therefore I think AppImageUpdate also should have a fix for the rate limit. Its almost likely that a user might have more than 60 AppImages and will check for updates in batches ( Like the NX Software Centre )
I'd say choose a different update strategy that produces less traffic, e.g., check for updates while the application in question is actually running, or something like that. And do it only once per week. Or something along those lines. Checking everything for updates constantly is producing too much load and is not economic.
@probonopd ignoring issues is not a suitable strategy to deal with them. This issue is real, and we should find a solution. Why should we not provide users of libappimageupdate with tools to work around these issues? Do you really want to tell them "well, please code your own solution, we don't want to fix this issue"? I don't think so. In order to increase the adoption of this library, it must provide a good UX for the developers.
@antony-jr your solution actually doesn't solve the issue, it does mitigate its effects in some edge cases only. However, if you would put in the checks whether you can use the static mode (failsafe is the wrong term, failsafe implies this can never go wrong, which is not true), then you can remove these from the rate limit checks. Also, it's less complicated, as it doesn't require making unnecessary requests, and integrates better into the workflow of the function.
If we didn't have such complex API bound types like the GitHub releases type, things would be a lot easier. But we have it, and we need to support them properly.
By the way, has any of you even asked GitHub to increase their rate limit...? Seems like the easiest solution to me.
@antony-jr I don't understand your second proposal. Please add some links to documentation. I have no idea how this can work. HTTP doesn't have any "regex" based request facilities.
I'm just trying to be thoughtful about resources...
@probonopd oh come on, that argument is pretty hypocritical. Please try to think about other people's use cases. I personally would like to be able to perform an update check at any point in time, and wouldn't like a tool to tell me "Started update check -- expected time amount: 2 hours -- reason: includes 61 GitHub based AppImages"...
I don't understand your second proposal. Please add some links to documentation. I have no idea how this can work. HTTP doesn't have any "regex" based request facilities.
@TheAssassin Yes HTTP does not have any "regex" , We need to apply the processed regex in the html source file which is located at 'purl' which is https://github.com/(username)/(repo)/releases/(tag)/
, Very simple web scraping but the success rate is good because we are just searching for the zsync filename.
@TheAssassin Even when github changes the html source files the regex should work good.
Not saying there may not be reason to check 61 AppImages for updates at once, but at least for me I never have the urge to... since I never use more than a handful on any particular day. This whole "let's check everything for updates, regardless of whether the user even still uses the app or not" thing is annoying me a lot. Worst offender: Android. I have a tablet that I ever only use to run exactly 1 app, ever. I use it every 14 days or so. Every time I switch it on, it informs me that it has updated (or wants to update - or whatever) 30+ apps, all of which came preinstalled on the tablet and I never use. Of course it has the urge to inform me about this using notifications. Which is just plain annoying. I don't like this mindset. But hey. Every user is different ;-)
@probonopd AppImageUpdate's update check is highly efficient, and doesn't waste any traffic. For example, it skips most of the .zsync
file's contents. I believe that during the HTTPS connection setup more data is transferred than actual payload data. Due to curl caching connections however, this handshake is only performed once.
@probonopd only because it doesn't fit your use case doesn't mean we may not implement a fix. This kind of attitude prevents innovation. "I don't need it so I decide that nobody else might need it". This issue is about a real bug, not about anybody's preferences (except for technical details like code style or workflow or alike). This is not Android. And there's a lot of people who like this. There are a lot of reasons to stay up to date. Especially for anything security related. Nobody said these update checks have to be made automatically furthermore. This is not the only use case, please don't focus only on it. It's just the first thing that came up on our all mind.
@antony-jr you can hardly call this "web scraping". If we decide to perform web scraping, we should implement it properly. Then, we can tell exactly when the HTML format changes, and tell users to report us this. No debugging needed.
Regarding your statement on the error rate, I disagree. This works might work right now for your single use case you have there, but your set of test data is too small. I can think of various cases in which it'd fail to recognize the correct entry. To do it properly, you'd have to also take into account the actual structure of the page, it is not even enough to just extend your regex to try to ensure there's an a
tag around it. What if, for example, there's a link to a .zsync
file in the description? Your approach would fail, and even with the a
tag, you'd get a false result. You would have to check whether the link is in the right "area" in the HTML file, which is the "file list". If GitHub structured their page properly, you could just check the parent tag(s).
@probonopd by the way, please beware that AppImageUpdate does perform an update check before every update, and makes at least 2 requests to the GitHub API. So, you could only update 31 AppImages in 2 hours. That's not a good rate at all.
Continued from https://github.com/AppImage/AppImageKit/issues/653#issuecomment-380662974
https://developer.github.com/v3/rate_limit/