FriendlyNeighborhoodShane / MinMicroG

Sources and scripts for MinMicroG installers. You shall find no prebuilt releases here.
GNU General Public License v3.0
306 stars 30 forks source link

Update script slow #23

Closed y0va closed 2 years ago

y0va commented 2 years ago

update.sh mirrors all defined packages locally. On consecutive calls a lot of unnecessary network traffic occurs. It would make sense for speed and traffic austerity to first check whether the remote version is different from the local version.

FriendlyNeighborhoodShane commented 2 years ago

I don't usually need to run it consecutively, but when I do, it's when I know I need some specific things. For that, the update script can take regex arguments.

# only fetch the 3 aurora apks
./update.sh aurora

# only fetch, well, the unlp apk and NLP backends
./update.sh nlpbackends unlp

# only fetch sync adapters and (swipe) library files
./update.sh googlesync \\.so

It filters the entire download list (specifically, their paths/filenames) with the given regexes and downloads the result.

FriendlyNeighborhoodShane commented 2 years ago

Actually, to be honest, I have had the same idea before and worked a little on it. But there's a few problems.

Mainly, there's the fact that there's no reliable way to check whether a remote file is the same as a local file without first actually downloading it. F-Droid is about the only source we use which provides hashes (through their repository files), so this can be done for F-Droid repos. But the rest of them, nope. For some sources with dynamic URLs you can record and compare URLs but without some kind of hash you don't know if the same URL still points to the same file (or if the user modified the local copy). e.g. Github and Gitlab allow you to reupload release assets, and people do use it sometimes to replace accidentally broken releases and the like. There's most likely no way to do this for direct URLs at all.

Here's what I came up with then:

https://github.com/FriendlyNeighborhoodShane/MinMicroG_releases/blob/dd8d188bd2c78d0920e2d6ef702bbbb2adcbd4ff/contrib/custom_hooks.sh#L4-L96

You can call this in pre_update_actions() in conf/resdl-download.txt.

It uses the URL approach (using the update logs stored in the releases directory as persistent data), and suffers from the same aforementioned problems as well. It's actually pretty dumb too; lt doesn't check for the existence of local files or compare hashes for F-Droid APKs when it can.

TBF, I never worked on this seriously enough to attempt to fix these hurdles. I just found it unnecessary because I can always selectively download stuff, so it was just a fun exercise for me.

y0va commented 2 years ago

It filters the entire download list (specifically, their paths/filenames) with the given regexes and downloads the result.

Ah, thats good to know. Is this documented somewhere?

I use the installer for creating a bigger app package. For updating this, regular calls every few days would make sense.

y0va commented 2 years ago

Actually, to be honest, I have had the same idea before and worked a little on it. But there's a few problems.

Mainly, there's the fact that there's no reliable way to check whether a remote file is the same as a local file without first actually downloading it. F-Droid is about the only source we use which provides hashes (through their repository files), so this can be done for F-Droid repos. But the rest of them, nope. For some sources with dynamic URLs you can record and compare URLs but without some kind of hash you don't know if the same URL still points to the same file (or if the user modified the local copy). e.g. Github and Gitlab allow you to reupload release assets, and people do use it sometimes to replace accidentally broken releases and the like. There's most likely no way to do this for direct URLs at all.

Here's what I came up with then:

https://github.com/FriendlyNeighborhoodShane/MinMicroG_releases/blob/dd8d188bd2c78d0920e2d6ef702bbbb2adcbd4ff/contrib/custom_hooks.sh#L4-L96

You can call this in pre_update_actions() in conf/resdl-download.txt.

It uses the URL approach (using the update logs stored in the releases directory as persistent data), and suffers from the same aforementioned problems as well. It's actually pretty dumb too; lt doesn't check for the existence of local files or compare hashes for F-Droid APKs when it can.

TBF, I never worked on this seriously enough to attempt to fix these hurdles. I just found it unnecessary because I can always selectively download stuff, so it was just a fun exercise for me.

Interesting approach. So you compare the last URL from logs with the current, and if it's different, we have a delta?

I thought of using the no-clobber functionality of wget, or to mimick this, so to compare date and size of the remote and local file.

FriendlyNeighborhoodShane commented 2 years ago

It filters the entire download list (specifically, their paths/filenames) with the given regexes and downloads the result.

Ah, thats good to know. Is this documented somewhere?

Yeah, in the main readme. Admittedly not very well though, it could use some elaboration. https://github.com/FriendlyNeighborhoodShane/MinMicroG/blob/4beff5812163bab440174daf94d73c4ce22d12fe/README.md?plain=1#L140-L143

Interesting approach. So you compare the last URL from logs with the current, and if it's different, we have a delta?

Yep. It parses them to the final, direct-downloadable URL forms (exactly the same as how the update script itself does) and then compares them. Since e.g. F-Droid URLs have a version code, and gitlab URLs have a release ID, this makes it a pretty good indicator for when files have changed. Though not perfect, as I mentioned above.

I thought of using the no-clobber functionality of wget, or to mimick this, so to compare date and size of the remote and local file.

Nice idea. But of course, now we're relying on the server to serve us a correct date.

$ curlh() { curl --head -L "$1" | grep -e '^last-modified: ' -e '^content-length: '; }

2 july release of AuroraStore:

$ curlh "https://gitlab.com/AuroraOSS/AuroraStore/uploads/7b15e21f5b293a3db2c10e0ea7719f1a/AuroraStore_4.0.7.apk"
content-length: 5840440
last-modified: Fri, 02 Jul 2021 22:43:19 GMT

24 march 2016 release of the maps API:

$ curlh "https://github.com/microg/android_frameworks_mapsv1/releases/download/v0.1.0/mapsv1.flashable.zip"
content-length: 626
last-modified: Mon, 22 May 2017 00:01:50 GMT
content-length: 306951

5 november build of the localGSM NLP backend:

$ curlh "https://f-droid.org/repo/org.fitchfamily.android.gsmlocation_73.apk"
Last-Modified: Fri, 05 Nov 2021 12:27:36 GMT
Content-Length: 2271918

The latest revision of a sync adapter

$ curlh "https://gitlab.opengapps.org/opengapps/all/raw/master/app/com.google.android.syncadapters.calendar/21/nodpi/2016267990.apk"

(no output here, because gitlab doesn't send those headers)

Looks promising. Despite the wildly incorrect date for the maps API (dunno what's up with that. Github server migration? Reupload by the developer?), and no data for the sync adapters (because it's likely streamed from the git server and the http server has no semantics of a last modified date to speak of), showing us that this data being accurate/available for the file isn't something guaranteed in any case, I think it should still be useful as long as the date being reported is ahead of the actual date. A couple of unneeded downloads are fine; updates that should have happened but didn't would be a problem.

y0va commented 2 years ago

See, how the check for date gives a good hint for possible binary changes? Next would be to compare number of bytes with a version prior to 2017/05/22. Have you already contacted @mar-v-in on that?
With a grep for the release date of the github release page the script could spit out an automated warning on such irregularities. But thats another issue...

For saving bandwidth the comparision with local size and date would be needed. Maybe like

local_size=$(stat -c %s "$resdldir/$object")
remote_size=$(curl -sI $URL | grep -i Content-Length | awk '{print $2}')

local_date=$(stat -c %Y "$resdldir/$object")

dateFromServer=$curl -I --silent "$objecturl" | grep 'Last-Modified: ' | sed -e 's/Last-Modified: //'); 
remote_date=$(date +"%s" -d "$dateFromServer")

And then pull that together. Idea for remote date snippet from: https://askubuntu.com/a/741318/710909

FriendlyNeighborhoodShane commented 2 years ago

Added remote mtime/size checking to the function. https://github.com/FriendlyNeighborhoodShane/MinMicroG_releases/commit/1f21e2f4c72b720fd0c6d040a8f6e2d43bca064b

FriendlyNeighborhoodShane commented 2 years ago

Closing because I think it's been addressed. Feel free to reopen if you have more to talk about.