Closed RichBarton-Arm closed 2 years ago
The error is coming from this line and the backtrace is following:
chkcon() at semutil.c:116
semant1() at semant.c:5,138
_parser() at parser.c:300
parser() at parser.c:152
I checked that the macro in semant1 is executed quite often without failure:
SST_SYMP(LHS, chkcon(RHS(1), DT_INT4, TRUE));
The ID for the stack pointer is set to S_IDENT
instead of S_CONST
. This makes some sense to me as the line (I removed the value assignment x = 1234531234
part to next line to simplify the issue):
real(dp) :: x
is in fact a new identifier x
and it requires dp
to be a constant.
I can see there is a number of places where the stack pointer's ID is set to S_IDENT
, notably this line, which gets triggered a number of times.
As @RichBarton-Arm mentioned, there are a few ways to make this compilation pass, so I assume I can find the line which modifies the stack pointer's ID in the failing case, which does NOT trigger in the passing case.
Regarding the second error (0081) it comes from this line. The out_dtype
is not defined (equal to -1
), because the dtype
is REAL
and the kind type is... 1 (while the expected valid values are 4, 8 and 16). I checked and none of the scopes defines dp
to be 1 and the default precision for real in the flang compiler is 4.
When I removed the use of f_precisions
from the reproduce program I managed to get rid of the second error (F90-S-0081-Illegal selector
). f_precisions
defines a public f_double=8
, while later module_defs
also defines a private f_double
to be equal to 4. The value of the public f_double
is reassigned to dp
which in the end influences the kind parameter causing an error. I still didn't manage to find where the value of 1 comes from...
So these are two distinct errors (as I managed to reproduce one without the other), just happening to reproduce in the same code line.
EDIT: the above isn't true. Turns out I was looking at error 0087 in a different code line....
Both in the successful and failing run the ID for the relevant stack pointer is being set to S_IDENT
.
The difference I managed to find is that L644 gets trigerred in the successful run, which sets the ID back to S_CONST
. The line is not entered in the failing run because in L621 STYPEG(sptr) = ST_UNKNOWN
, while in the passing case it is correctly set to ST_PARAM
.
The reason for this is that the scope is not found in L125 when line 50 is being parsed. It does get found in the successful run of the compilation.
The hash calculated for name "dp" and len=2 in the first_hash
function is 204. This hash value yields different result in the successful and failing run. Clearly some scope is not being added in the failing case. That would match the fact that removing the private
keyword from f_harmonics
in line 23 of the reproducer is making a difference in how the program behaves. f_harmonics
adds a public dp
when private
keyword is removed.
@kiranchandramohan , @RichBarton-Arm , if this is ringing any bells for you or if you have ideas that I could check next (like: I couldn't find a place where those scopes would be added?), I'll be grateful for any hints.
After adding some debugging flags (namely "Symbol Table Debug enable" I got some useful output, showing how var_2
/dp
(I renamed the original values to reflect what's going on in the reproducer rather than what was going on in the original code) is not being recognized correctly (left side) and stays real unknown
. On the right side is a case where I modified the original line:
! Removing this use statement and defining dp in this module fixes our issue
use Poisson_Solver
to
! Removing this use statement and defining dp in this module fixes our issue
integer, parameter :: dp = 4
Here's the full diff output:
var_2 real unknown | var_2 real unknown
sptr: 649 hashlk: 651 nmptr: 5057 dtype: 9 scope: 643 lineno: 0 enclfunc: 643 | sptr: 649 hashlk: 632 nmptr: 5057 dtype: 9 scope: 0 lineno: 0 enclfunc: 642
dcld:0 ccsym:0 save:0 ref:0 dinit:0 vol:0 ptrv:0 cvlen:0 | dcld:0 ccsym:0 save:0 ref:0 dinit:0 vol:0 ptrv:0 cvlen:0
address: 0 sc:0(SC_NONE) symlk: 631 midnum: 0 socptr: 0 autobj: 0 | address: 0 sc:0(SC_NONE) symlk: 631 midnum: 0 socptr: 0 autobj: 0
addrtkn:0 eqv:0 hccsym:0 alloc:0 arg:0 seq:0 nml:0 assn:0 | addrtkn:0 eqv:0 hccsym:0 alloc:0 arg:0 seq:0 nml:0 assn:0
private:0 hidden ignore sdsc: 0 ptroff: 0 descr: 0 | private:0 hidden ignore sdsc: 0 ptroff: 0 descr: 0
altname:0 | altname:0
|
new_type integer alias | new_type integer alias
sptr: 650 hashlk: 641 nmptr: 5127 dtype: 6 scope: 643 lineno: 0 enclfunc: 0 | sptr: 650 hashlk: 640 nmptr: 5113 dtype: 6 scope: 642 lineno: 0 enclfunc: 0
symlk: 641 private: 0 | symlk: 640 private: 0
|
var_2 real unknown | var_2 integer parameter
sptr: 651 hashlk: 632 nmptr: 5057 dtype: 9 scope: 643 lineno: 0 enclfunc: 0 | sptr: 651 hashlk: 649 nmptr: 5057 dtype: 6 scope: 647 lineno: 7 enclfunc: 647
dcld:0 ccsym:0 save:0 ref:0 dinit:0 vol:0 ptrv:0 cvlen:0 | conval1: 0x8 (8)
address: 0 sc:1(SC_LOCAL) symlk: 1 midnum: 0 socptr: 0 autobj: 0 | symlk:0 private:0 dcld end conval2: 15(ast)
addrtkn:0 eqv:0 hccsym:0 alloc:0 arg:0 seq:0 nml:0 assn:0 | ---------------------------------------------------------------------------------------------------
private:0 sdsc: 0 ptroff: 0 descr: 0 | ---------------------------------------------------------------------------------------------------
altname:0 | ---------------------------------------------------------------------------------------------------
This is unfortunately just another way of looking at the consequences. I still did not reach the root cause of the problem.
I found at least two ways of adding symbols to the symbol tab (an possibly introducing the type or not). In the non-erroneous case the first case doesn't get triggerred for sptr=651
(which is later failing in the constant check).
error:
insert_sym() at symtab.c:2,006 0x79155f
refsym() at semsym.c:987 0x73124b
semant2() at semant2.c:599 0x686885
_parser() at parser.c:302 0x5e3843
parser() at parser.c:152 0x5e3286
main() at main.c:208 0x5af83c
__libc_start_main() at libc-start.c:310 0x7ffff7448b97
_start() at 0x41aa0a
no_error:
installsym_ex() at symacc.c:253 0x78edcf
getsym() at symtab.c:477 0x79137d
getsymbol() at symtab.c:460 0x790d9d
get_next_sym() at symutl.c:51 0x7a0e7a
sym_get_arg_sec() at symutl.c:441 0x7a2f52
newargs_for_entry() at dpm_out.c:2,598 0x4a739e
transform_wrapup() at dpm_out.c:1,471 0x4a46dc
convert_output() at outconv.c:107 0x5ca0d2
main() at main.c:411 0x5b03f1
__libc_start_main() at libc-start.c:310 0x7ffff7448b97
_start() at 0x41aa0a
I investigated this a bit differently. I changed the code by removing "real(dp) :: x = 1234531234" and introducing "print *, dp" in place of it. This leads to a different error. F90-S-0038-Symbol, dp, has not been explicitly declared (rename-list.f90)
The above error shows a missing declaration. I then tried two different runs one with type(DFT_local_fields), intent(in) :: denspot and the other one which passes using an intrinsic type integer, intent(in) :: denspot
I get the following diffs when dumping the symbols using -Hq,0,1 -Hqq,parser,sym for both these runs. Investigating the reason for the difference might be a way forward.
dp integer
sptr:631 dtype:6 sc:0=NONE stype:16=parameter
enclfunc:626=module_defs hashlk:0 scope:626=module_defs symlk:0
conval1g:4 conval2g:13 paramval:0 slnk:0
dcld end param
...
...
dp real
sptr:635 dtype:9 sc:0=NONE stype:23=alias
enclfunc:627=f_harmonics hashlk:641=dp scope:627=f_harmonics symlk:634=f_double
gsame:0
dp integer
sptr:631 dtype:6 sc:1=LOCAL stype:16=parameter
enclfunc:626=module_defs hashlk:0 scope:626=module_defs symlk:0
conval1g:4 conval2g:13 paramval:0 slnk:0
dcld end param
...
...
dp real
sptr:635 dtype:9 sc:0=NONE stype:23=alias
enclfunc:627=f_harmonics hashlk:631=dp scope:627=f_harmonics symlk:634=f_double
gsame:0 private
Thanks a lot, @kiranchandramohan , this sure sheds some new light on the issue.
The error comes from semfin.c L3086. The CheckDecl()
function does not hit any of the success scenarios for the variable declaration. The first success case is that the variable is declared, period. All other scenarios address a situation where the variable would come from a "subprogram" and to my understanding this means a function or a subroutine. In our program the dp
variable should be coming from this line:
use module_defs ! defines dp=4
So the only chance we don't give an error here is that the dp
variable is declared.
The investigation has taken a track very similar to the previous one - the error comes from a declared flag not set, the reason is that when parsing the print *, dp
line, the dp
symbol is not found and a new one is inserted with a new sptr. But because we don't really have a declaration for it, we get an error, when checking the newly inserted symbol.
So the core question remains : why id dp
not being found inside the subroutine - regardless of whether is is being used to get printed out or to set a real number's kind.
I think I managed to find a solution for this bug, but I will put some more info here, not to pollute the PR with too much details which actually address the issue raised. @kiranchandramohan , your observation was helpful to me, as it produced a different symbol dump which helped me figure out what was going on.
The symptoms of this bug in the given reproduces were most visible in the sym_in_scope()
function, which failed to find the right dp
symbol from among a number of different symbols with this name available on the stack. The symbol which we should be finding (and which the error-free version of the reproducer was finding) was:
dp integer
sptr:630 dtype:6 sc:0=NONE stype:16=parameter
enclfunc:625=module_defs hashlk:0 scope:625=module_defs symlk:0
conval1g:4 conval2g:13 paramval:0 slnk:0
dcld end param
The sym_in_scope()
loops were actually considering this variable, but they were using a newly created dp
variable with sptr=650
instead. This variable is a temporary created in
dp real
sptr:650 dtype:9 sc:0=NONE stype:0=unknown
enclfunc:644=potential_from_charge_multipoles hashlk:659=dp scope:644=potential_from_charge_mu ltipoles symlk:633=f_double
address:0
autobj:0 cmblk:0 cvlen:0 adjstrlk:0 descr:0 midnum:0 newarg:0 devcopy:0 nmcnst:0 ptroff :0 sdsc:0 slnk:0
hidden ignore
This newer 650 dp
gets add by insert_sym()
called from add_use_rename()
in L317. Once I got here I noticed that even though we're adding a new alias, we're not setting its STYPE
. This, later in the course of events causes a different program flow in sym_in_scope()
, which causes the issue.
I hope this makes sense. I checked that the PR fixes both the original error and the error found in Kiran's version. I also ran make check-all
and got all passes. I hope the downstream tests also pass.
The change generally seems reasonable to me. I haven't found anything about alias
ed in the standard, but it seems intuitivley clear that a rename
effectively produces an alias
. I don't know how we got away without this fix for so long. Any comments are most welcome, either here or in the PR.
@michalpasztamobica is the output to the original problem by @RichBarton-Arm now correct with your proposed change?
The test for your pull request only tests whether the code compiles, so it's hard to know if it solves Rich's original problem.
Thanks for pointing this out, @mleair . The compilation error is gone but indeed, the output is different from expected. I checked that now the dp
variable points to f_double
from f_precisions
, which equals to 8. I will search further.
TL;DR
A hashtable with a hash calculated based on variable's name will only hold the symbol which got added last. We add the same "dp" symbol twice and the later addition will be the one that takes effect. This is quite a fundamental limitation and I haven't figured any easy way around at the moment.
Details:
On the contains
clause of module multipole
we are importing both module_defs
and f_harmonics
through use
statements:
module multipole
use module_defs ! defines dp=4
! Removing this use statement (and use of solid_harmonic) fixes the issue
use f_harmonics, only: solid_harmonic ! returns dp=8 defined in f_precisions
contains
apply_use()
-> import_module(module_defs)
-> import()
adds "dp"
from the imported module_defs.mod
file with a new sptr 630 (of type ST_PARAM
). This sptr comes from module_defs
scope.
Then a similar thing happens for f_harmonics
. The f_harmonics.mod
file also contains a "dp" symbol. It's there just so that expect_dp
can point to something. So the procedure will eventually figure out that "dp" has already been added, as all it checks for is the name of the symbol.
This is the exact backtrace of how to get to the moment where installsym_ex()
decides that sptr = 630 (previously used for module_defs
) is the one we can also use for f_harmonics
.
installsym_ex() at symacc.c:229 0x5555559b1be5
getsym() at symtab.c:477 0x5555559b4085
getsymbol() at symtab.c:460 0x5555559b405e
import_symbol() at interf.c:5,555 0x5555557292dc
import() at interf.c:2,762 0x55555571a93e
import_module() at interf.c:3,178 0x55555571c730
apply_use() at module.c:597 0x55555579ae3d
apply_use_stmts() at module.c:469 0x55555579a475
semant1() at semant.c:881 0x555555836ab5
_parser() at parser.c:300 0x5555557d2174
parser() at parser.c:88 0x5555557d1a37
main() at main.c:208 0x555555794b42
Then we check if the symbol sptr 630 "dp" is in current scope. sym_in_scope()
will correctly figure out that it is, thanks to module_defs
. We will then insert a new symbol. Unfortunately this will also imply that the 630
sptr in stb.hashtb
, which belongs to module_defs
will get replaced with the newly created 634
, which belongs to f_harmonics
scope.
Later on when we add more "dp" symbols or try to figure out where they belong based on the symbol's hash, the stb.hashtb
will return 634
"dp" from f_harmonics
.
I confirmed my suspicions by swapping use module_defs
with use f_harmonics
.
When module_defs
is use
d later the dp
is 4.
When f_harmonics
is use
d later the dp
is 8.
A hashtable with a hash calculated based on variable's name will only hold the symbol which got added last. We add the same "dp" symbol twice and the later addition will be the one that takes effect. This is quite a fundamental limitation and I haven't figured any easy way around at the moment.
You can walk each bucket in the symbol (hash) table for an entry using the first_hash() function and the HASHLK field. For example, to process every symbol that has the same name as symbol sptr:
for (sym = first_hash(sptr); sym > NOSYM; sym = HASHLKG(sym)) { if (NMPTRG(sptr) != NMPTRG(sym)) continue; etc. }
The NMPTRG field is a quick way to check whether sptr and sym have the same symbol name.
If you don't want to process sptr, then use something like the following:
if (sym == sptr || NMPTRG(sptr) != NMPTRG(sym)) continue;
-Mark
Thanks for the explanation, @mleair . You're absolutely right - one sptr gets overwritten, but it gets linked first, so it is accessible. It also gets found when sym_in_scope()
is checking if the dp
symbol is in the scope (and where exactly it belongs to). It just all happens in different order depending on which module is use
d first in the code.
One other difference I noticed if I swap the use
statements to use
f_harmonics
first, and then module_defs
like so:
module multipole
use f_harmonics, only: solid_harmonic ! returns dp=8 defined in f_precisions
use module_defs ! defines dp=4
contains
38 module multipole
39 ! Removing this use statement (and use of solid_harmonic) fixes the issue
40 use f_harmonics, only: solid_harmonic ! returns dp=8 defined in f_precisions
41 use module_defs ! defines dp=4
42 contains
43 subroutine potential_from_charge_multipoles(denspot)
44 ! Removing this use statement (and use of denspot) fixes the issue
45 use module_types, only: DFT_local_fields
46 ! Removing this use statement with rename fixes it
47 use Poisson_Solver, except_dp => dp ! dp=8 in Poisson_Solver so except_dp=8
is visible for line 47. An extra dp
(which later shows up during symbol lookup for dp
and matches better than the one from module_defs
) does not get added to the scope at all.
For the code from the reproducer this if clause is entered, adding a new dp
variable to the stack which is later used in the symbol look-up.
I must admit I couldn't figure out the meaning of this code:
case RENAME1:
add_use_stmt();
sptr = SST_SYMG(RHS(3));
sptr = add_use_rename((int)SST_SYMG(RHS(1)), sptr, 0);
SST_SYMP(RHS(3), sptr);
break;
@mleair , @kiranchandramohan , I'd be very grateful if you could help me figure out what this snippet is doing.
Depending on the arguments passed to add_use_rename()
function we enter different scenarios:
1) For module_defs
added first, the global
is an ST_PARAM named dp
with scope set to module_defs
and we will skip adding a new dp
symbol.
2) For f_harmonics
added first , the global
is an ST_ALIAS named dp
with scope set to f_harmonics
and we'll add a new dp
symbol which later messes up the symbol lookup when it comes to printing or setting real number's kind.
This is my understanding of this snippet:
sptr = SST_SYMG(RHS(3));
This will take the third element from the Semantic Stack (RHS(3)
) and will extract the symbol (I assume this is the sptr which this symbol corresponds to). I checked that it points to line 47, col 40, which is dp
symbol of type S_IDENT
.
sptr = add_use_rename((int)SST_SYMG(RHS(1)), sptr, 0);
This is a symbol found in col 27 - except_dp
, in both versions of the reproducer it is sptr = 645
.
SST_SYMP(RHS(3), sptr);
This line will replace the symbol that this Semantic Stack entry was pointing to with the symbol being a result of the add_use_rename()
function call.
In simple words - check what symbol is this semantic stack entry pointing to, then based on this check what it should really point to and then set it.
The stack pointers passed to add_use_rename
are different depending on the order in which the modules are added:
module_defs
added first:
global
: ST_ALIAS
in the f_harmonics
scope.local
: points to an sptr
that is unfilled to much extent.f_harmonics
added first:
global
: ST_PARAM
in the module_defs
scope.local
: points to an sptr
that is unfilled to much extent.So the way the sematic stack entries are filled is different. In fact, the dp
form the f_harmonics
should not be taken into consideration here, because it only added solid_harmonic
and dp
is private within that scope.
Before we get to add_use_rename()
we try to search for dp
in here.
The "module_defs first" program will hit sptr 634 from f_harmonics
and the "f_harmonics first" will hit 634 from module_defs
. The only flag being checked in the search algorithm is HIDDEN
. None of the symbols is hidden, though.
But in fact sptr 634 from f_harmonics
should not really be visible in line 47. It does have PRIVATE flag set, but we are not checking for it (and if we do, then a lot of tests fail).
Furthermore, I called sym_in_scope
and it returned sptr = 634 alright.
Finally I came up with this patch:
diff --git a/tools/flang1/flang1exe/semant.c b/tools/flang1/flang1exe/semant.c
index 21160d8c..aefb2d9c 100644
--- a/tools/flang1/flang1exe/semant.c
+++ b/tools/flang1/flang1exe/semant.c
@@ -10446,6 +10446,9 @@ procedure_stmt:
case RENAME1:
add_use_stmt();
sptr = SST_SYMG(RHS(3));
+ while (test_scope(sptr) == -1 && sptr != SPTR_NULL) {
+ sptr = HASHLKG(sptr);
+ }
sptr = add_use_rename((int)SST_SYMG(RHS(1)), sptr, 0);
SST_SYMP(RHS(3), sptr);
break;
And this fixes the issue (both with the previous ST_ALIAS setting and without it) and the program returns correct dp
value. Unfortunately this causes 22 tests to fail (oop075 to oop226
, to be exact), so it still needs some polishing.
I don't know if this will fix the 22 test failures...but you might be processing symbols that do not have the same name as sptr. That's why I have the NMPTR check in the example in my comment above:
for (sym = first_hash(sptr); sym > NOSYM; sym = HASHLKG(sym)) { if (sym != sptr || NMPTRG(sptr) != NMPTRG(sym)) continue; etc. }
You probably need to incorporate a similar check for "NMPTRG(sptr) != NMPTRG(sym)" in your while loop (where sym is the loop variable and sptr is the original sptr). Otherwise, you might assign a symbol with a different name to sptr.
Also, you probably want to start with the first symbol in the hash table bucket. So, you probably want to use a for loop similar to the one above. That is, start the search with first_hash(sptr). Otherwise, you might miss the correct symbol, no?
All the tests which fail deal with interfaces being renamed. I noticed that they have a scope set to SC_EXTERN
, so perhaps they are meant not to be in the scope at this stage. I'll add a check against this.
@mleair , you are right that I'd better start the search from the beginning and use a for loop, yet - if the previous algorithm found an in-scope symbol, I don't want to change that, so I'll wrap the whole loop with an if
, like so:
if (test_scope(sptr) == -1) {
// If symbol not in scope search for an in-scope symbol with same name.
}
Regarding the NMPTRG(sptr) != NMPTRG(sptr1)
- if the hash is calculated based on the name, then can there actually be two sptr
s with different names in it?
All the tests which fail deal with interfaces being renamed. I noticed that they have a scope set to
SC_EXTERN
, so perhaps they are meant not to be in the scope at this stage. I'll add a check against this.@mleair , you are right that I'd better start the search from the beginning and use a for loop, yet - if the previous algorithm found an in-scope symbol, I don't want to change that, so I'll wrap the whole loop with an
if
, like so:if (test_scope(sptr) == -1) { // If symbol not in scope search for an in-scope symbol with same name. }
Regarding the
NMPTRG(sptr) != NMPTRG(sptr1)
- if the hash is calculated based on the name, then can there actually be twosptr
s with different names in it?
Yes. See the definition of the HASH_ID macro in symacc.h. This macro is used to compute the element (and pointer to the bucket list) in the hash table.
Any reason why this is closed? IMO it should not be closed before #1149 is merged.
This issue covered the original bug and we have a new issue (#1146) for the regression. So I think keeping this closed is best.
The following reproducer is reduced from the BigDFT application. Flang is not able to compile this and gives an error on line 10:
In the reproducer, we have an integer type dp used as the kind to initialise a real x in the function _potential_from_chargemultipoles. dp is defined in _moduledefs and is also defined through a complicated sequence of module use statements from both _fharmonics and _PoissonSolver but neither of these definitions should be visible at the definition of x and the definition from _moduledefs should be used. This is the gfortran behaviour, so x is a sp real and solid_harmonic is a dp real.
The reproducer is very fragile and removing a number of things will cause it to build with flang and create a program that prints the expected output. I have tried to point all these out in comments