ArtemKushnerov / az

Downloads apks from androzoo repository https://androzoo.uni.lu/
MIT License
116 stars 23 forks source link

Race conditions? #8

Open Sebastiaan-Alvarez-Rodriguez opened 5 years ago

Sebastiaan-Alvarez-Rodriguez commented 5 years ago

Hello, these two commands were run:

No environmental changes were made. Only difference is the amount of threads. Many warnings apk with pkg <pkg_name> already exists were given in both cases.

Is your code thread safe? Can race conditions occur?

It would seem that multiple threads get assigned to the same download entry in the dataset.

Please fix this issue

ArtemKushnerov commented 5 years ago

Hi Sebastian! I'm sorry for a late response. Is the issue still relevant? This looks like a bug, I will took a look at it. As for apk already exists warnings, those are caused by apks with the same package names in the repo, so don't worry about this.

Sebastiaan-Alvarez-Rodriguez commented 5 years ago

Hi! I just figured: Does az include conflicting selected items only once?

Maybe, during my downloads back then, there was a different number of name conflicts, resulting in a different amount of apks.

ArtemKushnerov commented 5 years ago

Not exactly. After downloading an apk, az tries to save it by the<package_name>.apk name. If it already exists in a directory, meaning apk with such a package name has already been downloaded, it is being saved by <package_name+sha1>.apkname. The results can vary if you don't use the same seed argument. But the number of downloaded apks should be the same. And it is weird that you asked for 1000 apks and got 1136.

Sebastiaan-Alvarez-Rodriguez commented 5 years ago

A bit off point but I am really interested: What happens if a double name collision occurs? Does it get saved as <package_name+sha1+sha1>.apk?

I found it a bit strange myself too. I downloaded all apks to a new, empty directory, for as far as I can recall. During downloading, I did not use a seed argument.

Sadly, I already cleaned everything, so there are no logs to study further. Maybe it is better to close this issue for now. Perhaps someone else will experience this behaviour as well, and open an issue

ArtemKushnerov commented 5 years ago

A bit off point but I am really interested: What happens if a double name collision occurs? Does it get saved as <package_name+sha1+sha1>.apk?

This shouldn't be the case as sha1 is unique for a repository, as far as I know. If it is not it will be rewritten, which is still ok, because obviously contents would be the same. Though such an approach is a bit inefficient.

And it is weird that you asked for 1000 apks and got 1136. Now I see you asked for 10000 not 1000. Then it's ok to get less, because there could be only so much apks matching the criteria.

Also keep in mind that additionally to apks you will get metadata.csv and log.log files. Maybe this somehow played a role. Did you wipe everything in a directory between experiments by the way? I will try to reporduce your experiment a bit later.

Sebastiaan-Alvarez-Rodriguez commented 5 years ago

I did indeed wipe everything in the directory between experiments as far as I recall

arezoo3456 commented 2 years ago

hello please help me, how i can create .az file in windows10 ?