Closed karlmsmith closed 6 years ago
Debugging xeq_list with 49 variables, the my_cx array (copied from the is_cx array) appears correct
(4, 5, 6, ..., 50, 51, 52, 0, 0, 0, ...)
but the my_mr array (and the is_mr array it is copied from) appears correct only up through variable 48
(1, 2, 3, ..., 47, 48, 17, 0, 0, 0, ...)
So this then causes the listing of the last rows to show up again starting at 17. So appears the problem is in GET_PROT_CMND_DATA which assigns the is_mr array.
The place where we (possibly) ran into this before is described in #1757. There's a tar file with some scripts and data that produces this.
For this particular case, interp_stack line 336 calls FIND_MEM_VAR_INCL_C_CACHE
The call to INTERP_STACK is passing the namecode of the particular variable, which is a string EX#1
, EX#2
, ... EX#9
, EX#:
, EX#;
, ..., EX#@
, EX#A
, ..., EX#Z
, EX#[
, ..., `EX#, then one with the grave, then ,
EX#aat isp_base=49. This last name_code matches the uppercase name code
EX#A` at isp_base=17, so it thinks it has found a match.
So I think the solution is either the name_code matching needs to always be case sensitive, or it needs to avoid lowercase letters. Need to investigate further what can be done, but this explains the behavior seen here and probably in numerous other areas with large lists of variables.
It is FIND_MEM_VAR at find_mem_var_incl_c_cache.F line 68 that is thinking it has found the variable. The mv returned by mv_flink(mv) at line 185 (label 100) returns a match (positive integer). However, setting this to zero, indicating no match, does not change the results (ie, still incorrect). Setting mode stupid (which is suppose to disable finding matches in memory) also does change the results.
I see other places where it is assumed the name code is uppercase or is compared case blind. So the lowercase letters definitely need to be eliminated from use. EXPR_NAME is the offending routine - far too simplistic. Need to check whether more than one character after the EX# will be recognized as different.
I agree that internally comparisons of names need to be case-insensitive. Case sensitivity is handled in just a couple of cases, on dataset initialization for file variables, and when user-defined variable names are stored, so these can be written out using the original upper- or lower-case spelling. The rest of the code should be able to depend on being case-insensitive.
Revised to only use numbers and use more than one characters after the EX#
.
(The original method would blow up after about 207 variables.)
So now EX#1
, EX#2
, ..., EX#9
, EX#10
, EX#11
, ...., EX#99999
.
(The name code is CHARACTER*8
.)
STOP statements added if var number given is less than zero or more than 99999, which should never happen.
No differences observed in benchmark tests. Will add benchmark test from this SOCAT example.
benchmark test added and test results updated https://github.com/NOAA-PMEL/Ferret/commit/87bd7fd7d008a9b12f375c47160c83d86c04a991
The case in ticket #1757 is not resolved with this change.
That was a highly convoluted script which defines thousands of variables. It'd be nice if it were solved, but there was a straightforward way in that case to reorganize it. We pulled out a small number of variable definitions into a second script and then called that multiple times from the main script.
This came up from the all-available-variables listing in SOCAT, which has almost 60 variables. But apparently this also comes up in other contexts when requesting many variables in a command.
The ferret command (55 variables):
gives the result (49-55 duplicated, overwriting what should be in 17-23):
If the number of variables is 48, then all is fine. If 49 given, then one duplicate/overwrite in column 17; If 50, then two duplicate overwrites.
When this has cropped up in other contexts, one can break up the operation so no more the 48 variables are used at a time. But in for these TSV or CSV files, we cannot use this technique.