NOAA-EMC / NCEPLIBS-bufr

The NCEPLIBS-bufr library contains routines and utilites for working with the WMO BUFR format.
Other
40 stars 19 forks source link

Limit for sinv needs to be increased #579

Open SudhirNadiga-NOAA opened 3 months ago

SudhirNadiga-NOAA commented 3 months ago

When running sinv on a large tank, I get an error message.

[clogin07 /lfs/h2/emc/obsproc/noscrub/sudhir.nadiga]$ ls -l /lfs/h1/ops/para/dcom/20240318/b021/xx054 -rw-rw-r-- 1 ops.para para 2073922960 Mar 19 01:16 /lfs/h1/ops/para/dcom/20240318/b021/xx054 [clogin07 /lfs/h2/emc/obsproc/noscrub/sudhir.nadiga]$

ERROR MESSAGE BELOW [clogin07 /lfs/h2/emc/obsproc/noscrub/sudhir.nadiga]$ sinv /lfs/h1/ops/para/dcom/20240318/b021/xx054 +++++++++++++++++++++WARNING+++++++++++++++++++++++ BUFRLIB: UFBTAB - THE NO. OF DATA SUBSETS IN THE BUFR FILE IS .GT. LIMIT OF 16000000 IN THE 4TH ARG. (INPUT) - INCOMPLETE READ

UFBTAB STORED 15999291 REPORTS OUT OF ****<<<
+++++++++++++++++++++WARNING+++++++++++++++++++++++

+++++++++++++++++++++WARNING+++++++++++++++++++++++ BUFRLIB: UFBTAB - THE NO. OF DATA SUBSETS IN THE BUFR FILE IS .GT. LIMIT OF 16000000 IN THE 4TH ARG. (INPUT) - INCOMPLETE READ

UFBTAB STORED 15999291 REPORTS OUT OF ****<<<
+++++++++++++++++++++WARNING+++++++++++++++++++++++

209 NOAA 18 9109137 000
223 NOAA 19 6890154 000

                     15999291

[clogin07 /lfs/h2/emc/obsproc/noscrub/sudhir.nadiga]$

How do we address this issue? Thanks.

jbathegit commented 3 months ago

This is a parameter setting in the sinv utility, based on the expected maximum number of data subsets that one would ever expect to read from a single BUFR file. We could certainly set it to a larger number, but since we're already at 16 million, and since that number is used to dimension two underlying real*8 arrays, then at some point we could conceivably reach a limit where the resulting compiled object is too big to load into RAM. So we may also need to modify the utility to redefine the underlying said and siid arrays as allocatable and dynamically allocate them at run time, rather than fixing their sizes at compile time.

Either way, we'd need to set some practical limit in the utility. @SudhirNadiga-NOAA do you have any idea how much larger you'd need this setting to be? Note that you can get the count of subsets in any BUFR file by just calling ufbtab with a negative logical unit number, and if you do that then you don't need to worry about how big your array is actually dimensioned because it won't actually try to read and store any of the requested mnemonics in the last argument.

Also CCing @jack-woollen for his awareness :-)

jbathegit commented 3 months ago

FWIW, I just did this for the file you mentioned above, and it had just under 110 million subsets in it! Do you know if that's a typical daily count for these files?

SudhirNadiga-NOAA commented 3 months ago

Thanks for all your efforts. I will need to ask Iliana if she knows how big our tanks can get in the next year or so. My guess is that the biggest tanks we have are the b021/xx206 tanks. In the past, the poes-sst tanks used to be our biggest tanks, but these CrIS 431 tanks are huge, and the CrIS 2211 tanks (only made by development) are even bigger. I don't know if that fully answers your question, but I can ask Iliana if she has a better answer. [clogin06 /lfs/h1/ops/prod/dcom/20240319]$ ls -lrt b021/* -rw-rw-r-- 1 dfprod prod 263835224 Mar 20 00:05 b021/xx042 -rw-rw-r-- 1 dfprod prod 349471088 Mar 20 00:12 b021/xx213 -rw-rw-r-- 1 ops.prod prod 413967832 Mar 20 00:21 b021/xx246 -rw-rw-r-- 1 dfprod prod 1621221504 Mar 20 00:32 b021/xx039 -rw-rw-r-- 1 ops.prod prod 603775608 Mar 20 00:35 b021/xx046 -rw-rw-r-- 1 ops.prod prod 1951418592 Mar 20 00:36 b021/xx045 -rw-rw-r-- 1 dfprod prod 55290464 Mar 20 00:37 b021/xx044 -rw-rw-r-- 1 ops.prod prod 1698900976 Mar 20 01:04 b021/xx248 -rw-rw-r-- 1 dfprod prod 1346407584 Mar 20 01:13 b021/xx239 -rw-rw-r-- 1 dfprod prod 235666528 Mar 20 01:13 b021/xx036 -rw-rw-r-- 1 dfprod prod 66225472 Mar 20 01:13 b021/xx033 -rw-rw-r-- 1 ops.prod prod 947068136 Mar 20 01:16 b021/xx053 -rw-rw-r-- 1 ops.prod prod 1976323824 Mar 20 01:16 b021/xx054 -rw-rw-r-- 1 ops.prod prod 62270376 Mar 20 01:17 b021/xx028 -rw-rw-r-- 1 dfprod prod 140132320 Mar 20 01:19 b021/xx035 -rw-rw-r-- 1 ops.prod prod 5941454184 Mar 20 01:34 b021/xx206 -rw-rw-r-- 1 ops.prod prod 828036152 Mar 20 01:51 b021/xx051 -rw-rw-r-- 1 ops.prod prod 1971780304 Mar 20 01:52 b021/xx052 -rw-rw-r-- 1 ops.prod prod 3153255264 Mar 20 01:53 b021/xx241 -rw-rw-r-- 1 ops.prod prod 350048312 Mar 20 01:55 b021/xx201 -rw-rw-r-- 1 ops.prod prod 56303488 Mar 20 01:59 b021/xx023 -rw-rw-r-- 1 ops.prod prod 56355784 Mar 20 01:59 b021/xx123 -rw-rw-r-- 1 ops.prod prod 185883912 Mar 20 02:00 b021/xx027 -rw-rw-r-- 1 ops.prod prod 870942000 Mar 20 08:19 b021/xx203 -rw-rw-r-- 1 dfprod prod 307493728 Mar 20 08:48 b021/xx038 -rw-rw-r-- 1 dfprod prod 2338872816 Mar 20 12:14 b021/xx212 [clogin06 /lfs/h1/ops/prod/dcom/20240319]$ uftab b021/xx042

SudhirNadiga-NOAA commented 3 months ago

@ilianagenkova Do we have an idea as to the maximum desired for sinv in terms of number of subsets?

SudhirNadiga-NOAA commented 3 months ago

It looks like the xx054 tank may have more subsets by a lot (~18X ) [clogin07 /lfs/h1/ops/prod/dcom/20240319/b021]$ binv xx206

type messages subsets bytes

NC021206 32409 5832000 1646306317 179.95 TOTAL 32409 5832000 1646306317

[clogin07 /lfs/h1/ops/prod/dcom/20240319/b021]$ binv xx054

type messages subsets bytes

NC021054 4943114 104542051 1953387584 21.15 TOTAL 4943114 104542051 1953387584

[clogin07 /lfs/h1/ops/prod/dcom/20240319/b021]$

SudhirNadiga-NOAA commented 3 months ago

I am running binv on all our tanks, so we should have an idea as to the largest tanks by number of subsets. This will help answer your question.

SudhirNadiga-NOAA commented 3 months ago

My check shows that the poes_sst files have close to 600 million subsets.

type messages subsets bytes

NC012023 139362 556549305 -990011533 3993.55 TOTAL 139362 556549305 -990011533

Iliana and I will discuss this problem and get back to you on any further details, since we are considering writing the two different instruments in different tanks to reduce the number of subsets and for ease of dumping.

jbathegit commented 1 month ago

@SudhirNadiga-NOAA @ilianagenkova any updates on this?

ilianagenkova commented 1 month ago

I'll provide the largest obs counts that we see (per tank) in the next few days.