ckrause / loda

LODA is an assembly language, a computational model and a tool for mining integer sequence programs.
Apache License 2.0
20 stars 5 forks source link

Miner produces incorrect program for A235918, A236454, etc, by the use of dubious heuristics. #74

Closed karttu closed 3 years ago

karttu commented 3 years ago

Please consider miner's code in: https://github.com/ckrause/loda/blob/master/programs/oeis/235/A235918.asm

; A235918: Largest m such that 1, 2, ..., m divide n^2. ; 1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,6,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,6,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,6,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,6,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,6,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,6,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,7,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,4,1,2,1,2,1,6,1,2,1,2,1,4,1,2,1,2

cal $0,71222 ; Smallest k such that gcd(n,k) = gcd(n+1,k+1). mov $1,7 mov $2,$0 lpb $1 mov $1,$2 lpe

https://github.com/ckrause/loda/blob/master/programs/oeis/236/A236454.asm is otherwise same, but with add $1,1 at the end, which in itself is correct, because indeed A236454(n) = 1+A235918(n).

However, the program in both cases computes a wrong result (after a while), because it essentially computes a(n) = min(7, A071222(n)). [but note that A071222 is an offset-0 sequence, while A236454 is an offset-1 seq].

In the OEIS entry https://oeis.org/A235918 I have added the following comment:

Note that a(n) is equal to A071222(n-1) = A053669(n)-1 for the first 209 values of n. The first difference occurs at n=210, where a(210)=7, while A071222(209)=10.

Now, this diverging point 210 is still inside the range (0..250) of the current miner-settings, so one might think that the issue is not now that the checking-limit is too low. Except that it is, because the first term larger than 7 doesn't come until at n=420 in A235918:

$ gawk -f records.awk < b235918_upto10000_from_OEIS.txt 1 1 2 2 6 4 30 6 210 7 420 10 4620 12

Now, as what comes to the miner's implementation of A071222, it is interesting: https://github.com/ckrause/loda/blob/master/programs/oeis/071/A071222.asm as it calls A276084 (Number of trailing zeros in primorial base representation of n (A049345); largest k such that A002110(k) divides n). I think the logic is correct, even though the current implementation of A276084 is not: https://github.com/ckrause/loda/blob/master/programs/oeis/276/A276084.asm (it most likely computes the https://oeis.org/A341514 This is the topic of the previous issue, https://github.com/ckrause/loda/issues/72 ).

So the main point is: Could we explicitly control how the miner uses such "slacker's hacks" like min(constant, return_value_of_some_other_Axxxxx(...)), as in many cases this will produce incorrect programs? Also, would that be any easier to do if there were an explicit min-instruction? (Note how it how does this "min" with the help of loop-construct). Or is the best hope just to raise the limit where the program is checked against? Actually, I wonder why couldn't it check the whole b-file after the first 200-300 match, if the terms in b-file stay small?

On the other hand, a construct like min(1, Axxxxxx(n)) is perfectly valid for creating characteristic functions from sequences that give a count from 0 upward for some related phenomena.

Best regards,

Antti

ckrause commented 3 years ago

According to the list by Georg, the first differences occur at N=420:

pgm     idx      OEIS           idx LODA
A235918 420   10       |    420 7
A236454 420   11       |    420 8

When this PR is in, we will check at least 500 terms and therefore remove these incorrect programs: https://github.com/ckrause/loda/pull/75

ckrause commented 3 years ago

The mentioned programs have been deleted in the meantime. We've extended the number of terms to check and we introduced a denylist.