NOAA-EMC / NCEPLIBS-g2c

This library contains C decoder/encoder routines for GRIB edition 2.
Other
18 stars 11 forks source link

why does opening a file with an index take longer than opening one without an index? #364

Closed edwardhartnett closed 10 months ago

edwardhartnett commented 1 year ago

I have two functions, g2c_open() which opens a file without an index, and g2c_read_index() which opens the file using an index.

I expected using the index to be faster, but it's slower. Here's some code from tst_index.c:

    printf("Testing speed of g2c_read_index() on file %s downloaded via FTP...\n", FTP_FILE);
    {
    int g2cid;
        int num_msg;
        struct timeval start_time, end_time, diff_time;
        int open_1_us, open_2_us;
    int ret;

    /* Open the data file without using the index file. */
        if (gettimeofday(&start_time, NULL))
            return G2C_ERROR;
    if ((ret = g2c_open(FTP_FILE, 0, &g2cid)))
        return ret;
        if (gettimeofday(&end_time, NULL)) 
            return G2C_ERROR;
        if (nc4_timeval_subtract(&diff_time, &end_time, &start_time))
            return G2C_ERROR;
        open_1_us = (int)diff_time.tv_sec * MILLION + (int)diff_time.tv_usec;

        /* Check some stuff. */
        if ((ret = g2c_inq(g2cid, &num_msg)))
            return ret;
        if (num_msg != 688)
            return G2C_ERROR;

        /* Close the file. */
    if ((ret = g2c_close(g2cid)))
        return ret;

    /* Open the data file using the index file. */
        if (gettimeofday(&start_time, NULL))
            return G2C_ERROR;
    if ((ret = g2c_read_index(FTP_FILE, REF_FTP_FILE, 0, &g2cid)))
        return ret;
        if (gettimeofday(&end_time, NULL)) 
            return G2C_ERROR;
        if (nc4_timeval_subtract(&diff_time, &end_time, &start_time)) 
            return G2C_ERROR;
        open_2_us = (int)diff_time.tv_sec * MILLION + (int)diff_time.tv_usec;
        printf("open without index %d with index %d, using index saved %d microseconds\n",
               open_1_us, open_2_us, open_1_us - open_2_us);

        /* Check some stuff. */
        if ((ret = g2c_inq(g2cid, &num_msg)))
            return ret;
        if (num_msg != 688)
            return G2C_ERROR;

        /* Close the file. */
    if ((ret = g2c_close(g2cid)))
        return ret;

This produces the following output:

Testing speed of g2c_read_index() on file WW3_Regional_US_West_Coast_20220718_0000.grib2 downloaded via FTP...
open without index 8235 with index 22059, using index saved -13824 microseconds

Part of the problem ,may be that the index file is missing one important piece of information, the section length for section 7. This is a vital number for producing the output of degrib2, but it is not stored in the index file. So when opening with an index file, I also have to open the data file and read the section lengths for every section 7. Perhaps this slows things down.

It would be good to run a profiler and find the detailed explanation for why opening with an index is slower.

edwardhartnett commented 10 months ago

Using index files is much faster when jumping around a really big file. so it's slowness opening a small sample file is not really important.