need to improve the get_file_list mechanism

randytpierce commented 6 months ago

The code uses a data set in couchbase that is retrieved by this query

SELECT url,
    mtime
FROM `vxdata`._default.METAR
WHERE subset = 'METAR'
    AND type = 'DF'
    AND fileType = 'grib2'
    AND originType = 'model'
    and model = 'HRRR_OPS'
    AND url IS NOT MISSING
    AND mtime IS NOT MISSING
order by url;

and this is turning out to be very inefficient. It showed up as a big difference in the capella evaluation tests. There is a document for each file that gets processed, essentially, and that is just too many documents. It should probably be an array of files for a kind of ingest or something.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 90 days with no activity.

github-actions[bot] commented 3 days ago

This issue is stale because it has been open 90 days with no activity.

NOAA-GSL / VxIngest

need to improve the get_file_list mechanism #338