ckrause / loda

LODA is an assembly language, a computational model and a tool for mining integer sequence programs.
Apache License 2.0
20 stars 5 forks source link

Miner produces an incorrect program: Intended to compute A276153, but computes A341356 instead, because upper limit too low. #72

Closed karttu closed 3 years ago

karttu commented 3 years ago

Here is an instance of somewhat fundamental fault I saw already two weeks ago, but I haven't had time to elaborate it until now. Following is the miner-generated code in: https://github.com/ckrause/loda/blob/master/programs/oeis/276/A276153.asm

; A276153: The most significant digit when n is written in primorial base (A049345). ; 0,1,1,1,2,2,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1

mov $2,2 mov $3,3 lpb $0 mov $1,$0 div $0,$2 mov $2,$3 add $3,2 lpe

My immediate first thought when seeing the above code was that, "wow, what a concise way to do it!", and then the second thought about a second later: "but that cannot be right!".

See, instead of really generating primes as 2, 3, 5, 7, 11, 13, 17, 19, ..., the miner cuts some corners, and instead spews out "the slacker's primes": 2, 3, 5, 7, 9, 11, 13, 15, 17, ..., that is, 2 followed by all odd numbers >= 3.

Now, we can think it still computes the most significant digit of some base, and that could be called "A097801-base", for the sequence A097801 [a(n) = (2n)!/(n!2^(n-1))] begins as 2, 2, 6, 30, 210, 1890, 20790, 270270, 4054050, 68918850, ..., i.e., products 2, 23, 235, 2357, 23579, etc. (if it had a(0)=1 instead of 2, the name would be even better fit, because we still want the least significant digit for 1's).

I have submitted this as: https://oeis.org/A341356 ("The most significant digit in A097801-base.") which is precisely the sequence the miner-code computes in its version of A276153. The problem is that their first difference doesn't come until at n=1890.

And similarly, in https://oeis.org/A276150 vs. https://oeis.org/A341513 (sum of digits) https://oeis.org/A276084 vs. https://oeis.org/A341514 (number of trailing zeros) also in these cases the first difference doesn't come until at that point.

So, I think here is a good reason to raise the upper limit up to what n the miner will check its results. Also, if the miner detects that there are several b-files in OEIS that do not diverge until at certain points (even after its own current upper limit), it would be still a good idea to check that code on all the diverging points, to deduce which ones of those OEIS-sequences it at least do NOT compute. For example, for these three sequences: A099564, A276153, A341356 the diverging points are at n=210 and n=1890, so checking the miner's program just at those points would immediately tell which two of the sequences it certainly is not.

But... there is a more fundamental issue here: even if we rise the limit, the miner's next innovation might be to compute "primes" as 2, 3, 5, 7, 11, 13, 15, 17, 19, 21, 23, 27, 29, 31, 33, ..., that is, just skip 9, 25, 49, 81, 121, ..., and other odd squares from the list of odd numbers. I would bet this is still easier to find by random mining than to find an algorithm that generates real primes. And in this case the first difference to the true primorial base wouldn't appear until at 2357111315 = 450450. And there's no way we can stop it inventing even better "slacker's approximations of primes" for ever.

Now, one solution to these kinds of problems would be to "pre-empt" such miner-attempts by hand-coding important sequences that it is likely to get wrong, and effectively protect those A-numbers from mining attempts. And that is what I did when I coded manually https://github.com/ckrause/loda/blob/master/programs/oeis/276/A276150.asm for the primorial base sum, based on more general algorithm presented at: https://github.com/ckrause/loda/blob/master/programs/oeis/276/A276086.asm (from which it is easy to create other primorial-base related sequences).

Now, for the fun, you may also take a look what these do: https://github.com/ckrause/loda/blob/master/programs/oeis/341/A341356.asm https://github.com/ckrause/loda/blob/master/programs/oeis/341/A341513.asm https://github.com/ckrause/loda/blob/master/programs/oeis/341/A341514.asm (they just call the "primorial base"-versions, so accidentally, A341356 gets the correct result, because by this logic wrong x wrong = right).

Best regards,

Antti

PS. See also the https://github.com/ckrause/loda/issues/74 for a related issue.

ckrause commented 3 years ago

We will raise the limit of terms to be checked. If we still get tricky sequences wrong, your idea of manually coding them and protecting them makes sense.

ckrause commented 3 years ago

The maximum number of terms to be checked has been raised to 2000 already a while ago. The incorrect program for A276153 was automatically removed by the maintenance job.