Open chfoo opened 10 years ago
soultcer: You could also just scan them randomly until you have a statistically high enough chance of hitting each code at least once, or try some sort of sequence soultcer: As I have said earlier, I actually like the "random" approach because then you get some codes twice. If some evil terroroftinytown-client returns wrong results, you will find out about it sooner or later ;-) soultcer: chfoo: You don't have to give out every job twice. You just need to check one or two codes from each job and see if they are correct. So for say each 10 jobs, you hand out 1 verification job that includes 20 shortcodes soultcer: But so far I have never detected any problems. The only times I had different results was when the owner of a url shortener decided that he needed a specific shortcode and just deleted the old URL (is.gd), or with tinyurl which displays advertisements when an URL returns a 404 and sometimes has encoding issues.
I didn't see this until now:
soultcer: What you are probably looking for is a maximum length sequences https://en.wikipedia.org/wiki/Maximum_length_sequence soultcer: Just FYI, for the old database I used something similar but a bit more simple, basically just taking the md5 hash of each previous shortcode: https://github.com/ArchiveTeam/tinyback/blob/master/tinyback/generators.py#L43-77 soultcer: This obviously means that the same code will sometimes be hit twice, but this was intended. If a bad person somehow tried to return wrong results, sooner or later another user would hit the same shortcode at random and we would discover the wrong data aaaaaaaaa: The problem with hashes is that it isn't guaranteed to hit all of them and may get stuck in a loop aaaaaaaaa: But I'm not familiar with the MLS you suggested, hopefully the devs are.
But to update:
Right now, we're just doing random codes for bitly and isgd. It's the easiest to implement and some URLs are stuck behind a dead server so we'll never get 100% of any major shortener regardless.