Closed hsoh-u closed 1 month ago
@hsoh-u I've been looking at the new test for added by this Pull Request and note that it takes much longer than a similar, existing ADP test:
TEST: point2grid_GOES_16_ADP - pass - 2.312 sec
TEST: point2grid_GOES_16_ADP_Enterprise_high - pass - 29.124 sec
The runtime increases from around 2 seconds to around 30.
I realize that the input data differs, but not dramatically so. Both have the same X/Y dimensions:
y = 1500 ;
x = 2500 ;
The existing AOD data is short:
short AOD(y, x) ;
AOD:_FillValue = -1s ;
Whereas the new AOD data is unsigned-short:
ushort AOD(y, x) ;
AOD:_FillValue = 65535US ;
Do you have any idea why there's such a dramatic difference in runtime?
I will take a look. There was no execution differences by running the same commands manually
time /d1/personal/hsoh/git/bugfixes/bugfix_2867_point2grid_qc_flag/MET/bin/point2grid \
/d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-AODC-M6_G16_s20241100001171_e20241100003544_c20241100006242.nc \
G212 \
/d1/personal/hsoh/MET/test_output/bugfix_2867_point2grid_qc_flag/point2grid/point2grid_GOES_16_ADP_Enterprise_high.nc \
-field 'name="AOD_Smoke"; level="(*,*)";' \
-adp /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-ADPC-M6_G16_s20241100001171_e20241100003544_c20241100006361.nc \
-qc 0,1 -method MAX -v 1 > log_enterprise
real 0m1.834s
user 0m1.905s
sys 0m3.893s
time /d1/personal/hsoh/git/bugfixes/bugfix_2867_point2grid_qc_flag/MET/bin/point2grid \
/d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-AODC-M6_G16_s20192662141196_e20192662143569_c20192662145547.nc \
G212 \
/d1/personal/hsoh/MET/test_output/bugfix_2867_point2grid_qc_flag/point2grid/point2grid_GOES_16_ADP.nc \
-field 'name="AOD_Smoke"; level="(*,*)";' \
-adp /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-ADPC-M6_G16_s20192662141196_e20192662143569_c20192662144526.nc \
-qc 1,2 -method MAX -v 1 > zzz_baseline
real 0m1.943s
user 0m1.965s
sys 0m3.972s
I ran the unit test point2grid manually and got the same result.
TEST: point2grid_GOES_16_ADP - pass - 1.901 sec
TEST: point2grid_GOES_16_ADP_Enterprise_high - pass - 29.006 sec
Here are execution time from the log file: actual runtime = 29 seconds ( from 17:55:17Z
- 17:54:48Z
export MET_TMP_DIR='${MET_TEST_OUTPUT}/point2grid'
/d1/personal/hsoh/git/bugfixes/bugfix_2867_point2grid_qc_flag/MET/share/met/../../bin/point2grid \
/d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-AODC-M6_G16_s20241100001171_e20241100003544_c20241100006242.nc \
G212 \
/d1/personal/hsoh/MET/test_output/bugfix_2867_point2grid_qc_flag/point2grid/point2grid_GOES_16_ADP_Enterprise_high.nc \
-field 'name="AOD_Smoke"; level="(*,*)";' \
-adp /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-ADPC-M6_G16_s20241100001171_e20241100003544_c20241100006361.nc \
-qc 0,1 -method MAX \
-v 1
DEBUG 1: Start point2grid by hsoh(9895) at 2024-05-21 17:54:48Z cmd: /d1/personal/hsoh/git/bugfixes/bugfix_2867_point2grid_qc_flag/MET/share/met/../../bin/point2grid /d1/projects/MET/MET_test_data/unit_tes
t/model_data/goes_16/OR_ABI-L2-AODC-M6_G16_s20241100001171_e20241100003544_c20241100006242.nc G212 /d1/personal/hsoh/MET/test_output/bugfix_2867_point2grid_qc_flag/point2grid/point2grid_GOES_16_ADP_Enterpri
se_high.nc -field name="AOD_Smoke"; level="(*,*)"; -adp /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-ADPC-M6_G16_s20241100001171_e20241100003544_c20241100006361.nc -qc 0,1 -method
MAX -v 1
DEBUG 1: Reading data file: /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-AODC-M6_G16_s20241100001171_e20241100003544_c20241100006242.nc
DEBUG 1: Writing output file: /d1/personal/hsoh/MET/test_output/bugfix_2867_point2grid_qc_flag/point2grid/point2grid_GOES_16_ADP_Enterprise_high.nc
DEBUG 1: Finish point2grid by hsoh(9895) at 2024-05-21 17:55:17Z
Opening /d1/personal/hsoh/MET/test_output/bugfix_2867_point2grid_qc_flag/point2grid/point2grid_GOES_16_ADP_Enterprise_high.nc
Checking AOD_Smoke ... OK
Checking t ... OK
Checking time_bounds ... OK
unset MET_TMP_DIR
I copied the command and ran it manually. It took 2 seconds (18:04:03Z
- 18:04:01Z
)
DEBUG 1: Start point2grid by hsoh(9895) at 2024-05-21 18:04:01Z cmd: /d1/personal/hsoh/git/bugfixes/bugfix_2867_point2grid_qc_flag/MET/share/met/../../bin/point2grid /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-AODC-M6_G16_s20241100001171_e20241100003544_c20241100006242.nc G212 /d1/personal/hsoh/MET/test_output/bugfix_2867_point2grid_qc_flag/point2grid/point2grid_GOES_16_ADP_Enterprise_high.nc -field name="AOD_Smoke"; level="(*,*)"; -adp /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-ADPC-M6_G16_s20241100001171_e20241100003544_c20241100006361.nc -qc 0,1 -method MAX -v 1
DEBUG 1: Reading data file: /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-AODC-M6_G16_s20241100001171_e20241100003544_c20241100006242.nc
DEBUG 1: Writing output file: /d1/personal/hsoh/MET/test_output/bugfix_2867_point2grid_qc_flag/point2grid/point2grid_GOES_16_ADP_Enterprise_high.nc
DEBUG 1: Finish point2grid by hsoh(9895) at 2024-05-21 18:04:03Z
I need to find a way to duplicate the problem.
@hsoh-u the big latency is caused by having MET_TMP_DIR
set. Without it set, it takes ~ 2 seconds. With it set, it takes ~ 30 seconds. Can you please take a look to figure out why there's such a large difference?
I'll also note that if you set it to something that doesn't exist (export MET_TMP_DIR=/bad/path
) or to which you don't have write permission (export MET_TMP_DIR=/home/jopatz
), it segfaults:
DEBUG 1: Reading data file: /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-AODC-M6_G16_s20241100001171_e20241100003544_c20241100006242.nc
FATAL ERROR (SEGFAULT): Process 1867683 got signal 11 @ local time = 2024-05-22 20:13:29Z
FATAL ERROR (SEGFAULT): Look for a core file in /d1/projects/MET/MET_pull_requests/met-12.0.0/beta5/MET-bugfix_2867_point2grid_qc_flag/internal/test_unit
FATAL ERROR (SEGFAULT): Process command line: /d1/projects/MET/MET_pull_requests/met-12.0.0/beta5/MET-bugfix_2867_point2grid_qc_flag/internal/test_unit/../../share/met/../../bin/point2grid /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-AODC-M6_G16_s20241100001171_e20241100003544_c20241100006242.nc G212 /d1/projects/MET/MET_pull_requests/met-12.0.0/beta5/MET-bugfix_2867_point2grid_qc_flag/internal/test_unit/../../test_output/point2grid/point2grid_GOES_16_ADP_Enterprise_high.nc -field name="AOD_Smoke"; level="(*,*)"; -adp /d1/projects/MET/MET_test_data/unit_test/model_data/goes_16/OR_ABI-L2-ADPC-M6_G16_s20241100001171_e20241100003544_c20241100006361.nc -qc 0,1 -method MAX -v 1
Segmentation fault
Ideally, it would handle this error condition more gracefully.
This is one time slowness when a new MET_TMP_DIR is set or a new target grid is added. For the GOES data, point2grid generates the mapping to each target grid cell (point lat/lon list for each target grid cell) and saves the mappings to the NetCDF file at $MET_TMP_DIR. The next runs are fast by using the pre-gererated mapping. So this is not a bug.
Expected Differences
The meaning of ADP QC values were changed (it was 3 for high, 2 for medium, and 1 for low). The baseline algorithm and the enterprise algorithm produce different QC values for high, medium, and low. MET reads QC values and meanings from the variable attribute and apply them to
-qc
options (where 0 is high, 1 is medium, and 2 is low).-qc option
is not givenThe
-qc
options at the unittests were changed to-qc 0,1,2
.[x] Do these changes introduce new tools, command line arguments, or configuration file options? [No] If yes, please describe:
[x] Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No] If yes, please describe:
Pull Request Testing
An unit test is added
New GOES16 data with Enterprise algorithm:
==>
Note: if only high quality is given with
-qc 0
, all data will be filtered out.Old GOES16 data with Baseline algorithm:
==>
Same with main V11.1 bugfix. More AOD files with Enterprise algorithm are at seneca:/d1/personal/hsoh/data/MET-2853/20240419
[x] Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [No]
[x] Do these changes include sufficient testing updates? [No]
[x] Will this PR result in changes to the MET test suite? [Yes] If yes, describe the new output and/or changes to the existing output:
One new file and three different output files because 1) the logic to compute QC flags was changed and 2) the
-qc
option is changed.NEW: point2grid/point2grid_GOES_16_ADP_Enterprise_high.nc
DIFF: point2grid/point2grid_GOES_16_ADP.nc
DIFF: point2grid_GOES_16_AOD_TO_G212_grid_map.nc
DIFF: point2grid_GOES_16_AOD_TO_G212.nc
[ ] Will this PR result in changes to existing METplus Use Cases? [Yes or No] If yes, create a new Update Truth METplus issue to describe them.
[x] Do these changes introduce new SonarQube findings? [Yes] If yes, please describe:
Maybe. Many findings were resolved, but the exiting findings can be identified as new by the SonarQube server.
Pull Request Checklist
See the METplus Workflow for details.