JCSDA / CRTMv3

CRTMv3 repository for coordinated development and releases. Code history is not carried in this repository prior to v3, to reduce the cloning overhead. For v2.x history leading up to v3, see JCSDA/crtm repository.
Other
6 stars 6 forks source link

test_adjoint_Simple #29

Closed chengdang closed 1 year ago

chengdang commented 1 year ago

With the CRTM v3 test branch (feature/v241_merge_test), the other group of tests constantly failed are test_adjoint_Simple, see issue #19 :

The following tests FAILED:
    121 - test_adjoint_Simple_atms_n21 (Failed)
    122 - test_adjoint_Simple_cris-fsr_n21 (Failed)
    123 - test_adjoint_Simple_v.abi_g18 (Failed)
    124 - test_adjoint_Simple_atms_npp (Failed)
    125 - test_adjoint_Simple_cris399_npp (Failed)
    126 - test_adjoint_Simple_v.abi_gr (Failed)
    127 - test_adjoint_Simple_abi_g18 (Failed)
    128 - test_adjoint_Simple_modis_aqua (Failed)
chengdang commented 1 year ago

To look into this issue, I first removed cloud and aerosol by setting N_CLOUDS and N_AEROSOLS as zero:

https://github.com/JCSDA/CRTMv3/blob/2b5a0c5fa9947f9cab2fb40d8077d5210b38c716/test/mains/regression/adjoint/test_Simple/test_Simple.f90#L45

The tests still failed with this clear sky setup.

chengdang commented 1 year ago

I then looked at the atm_AD and Atmosphere_AD used for comparison:

https://github.com/JCSDA/CRTMv3/blob/2b5a0c5fa9947f9cab2fb40d8077d5210b38c716/test/mains/regression/adjoint/test_Simple/test_Simple.f90#L422

Similar to what we noticed in issue #28, somehow the test mistakenly saved the variable H2O(Mass mixing ratio, g/kg), O3(Volume mixing ratio, ppmv), Layer cloud fraction in the temporary file, therefore atm_AD and Atmosphere_AD are not identical.

chengdang commented 1 year ago

For example, with test_adjoint_Simple_cris399_npp, profile 1 with N_CLOUDS = 0, N_AEROSOLS = 0. The cloud fraction AD should be zero as show in Atmosphere_AD, and if you look at the values of these variables, they are the same but placed under different variable names. The OMP_NUM_THREADS=1 for these tests.

atm_AD, computed, saved, and read from the temporary file:

Layer absorber:
125:      H2O(Mass mixing ratio, g/kg)
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00
125:      O3(Volume mixing ratio, ppmv)
125:   6.142456766251701E+01  5.304477818767415E+01  3.689097390957784E+01  2.325195646636487E+01  8.495578392047705E+00
125:  -6.797006542146675E+00 -1.629009301054668E+01 -1.662297945539264E+01 -5.166830311274156E+00  1.639353111088985E+01
125:   3.627302313607766E+01  4.623796795491968E+01  5.366620569388418E+01  6.753680033628423E+01  7.946669424786768E+01
125:   8.257805024098158E+01  7.577311580822129E+01  6.432901810878136E+01  5.228491785391348E+01  3.581944584325409E+01
125:   1.211700888591228E+01 -5.479093695450344E+00 -1.524797497617080E+01 -2.768588876717426E+01 -4.329695872261105E+01
125:  -5.600106137656109E+01 -6.264934955106590E+01 -6.562482985365862E+01 -6.271270155942371E+01 -5.410454547965717E+01
125:  -3.467292398809365E+01 -7.734618913726947E-01  3.323948249295362E+01  5.178255089312876E+01  7.374615414085142E+01
125:   1.220458717195147E+02  2.090357742813229E+02  2.312280860370082E+02  2.973091033800375E+02  3.152943787224053E+02
125:   3.147997327695679E+02  3.149255202376236E+02  2.957412693436690E+02  2.737065305742941E+02  1.606428170859161E+02
125:   9.037220187491141E+01  4.379028197808394E+00 -5.472347564902283E+01 -8.386606767529400E+01 -1.112940509607798E+02
125:  -1.326980168660481E+02 -1.539631037428189E+02 -1.737056976336570E+02 -1.939893221447279E+02 -2.085987436204250E+02
125:  -2.205271562743123E+02 -2.289129643280302E+02 -2.301014840518982E+02 -2.177446595354391E+02 -1.976523158967921E+02
125:  -1.748459226093751E+02 -1.517642541123914E+02 -1.310586486449482E+02 -1.139518417218672E+02 -9.927330431003635E+01
125:  -8.714110734191345E+01 -7.613868707807178E+01 -6.606658642659447E+01 -5.707396527760154E+01 -4.927090687137959E+01
125:  -4.273578450525807E+01 -3.735463690391561E+01 -3.382496779939049E+01 -3.162054012886624E+01 -2.924134480625098E+01
125:  -2.677190017066148E+01 -2.447189408199796E+01 -2.254887061807074E+01 -2.042770129267942E+01 -1.803371429734108E+01
125:  -1.560450366767567E+01 -1.337282517000781E+01 -1.109580344817812E+01 -8.645801744618065E+00 -6.094917510759959E+00
125:  -3.280776803351705E+00 -2.645287138480897E-01  1.960860876330768E+00  2.723520061595587E+00  2.780488436520078E+00
125:   2.840862706013108E+00  2.900170928171081E+00
125:    Layer cloud fraction:
125:   4.877826584575503E-01  4.399638943049464E-01  2.654443874749655E-01  6.648765599005857E-02 -1.411176804786126E-01
125:  -3.079475487554525E-01 -4.195958890604522E-01 -4.596945555374102E-01 -3.838906116620937E-01 -2.248562164856887E-01
125:  -9.838982702682184E-02 -8.998144867739699E-02 -1.247681220115979E-01 -1.166479101714847E-01 -1.425098171711824E-01
125:  -2.548437548606552E-01 -4.516865049879745E-01 -6.790246409957772E-01 -9.289423251771439E-01 -1.243098837697085E+00
125:  -1.636010488351747E+00 -2.001836710800018E+00 -2.334868243092344E+00 -2.710360977624231E+00 -3.138693769610450E+00
125:  -3.565249751892146E+00 -3.962846683819050E+00 -4.355304452415558E+00 -4.730528642960445E+00 -5.095999058211158E+00
125:  -5.417373119332703E+00 -5.666122530343816E+00 -5.929490512797569E+00 -6.368003807437389E+00 -6.842816263925140E+00
125:  -7.071800983161490E+00 -7.101961603837075E+00 -7.328525179613559E+00 -7.923848454792981E+00 -8.785370404722251E+00
125:  -9.810400466821124E+00 -1.085893793361037E+01 -1.209365857429626E+01 -1.356429973276975E+01 -1.513127786739245E+01
125:  -1.685961795107050E+01 -1.872212397069848E+01 -2.033963533120699E+01 -2.168891253323039E+01 -2.293045899536332E+01
125:  -2.415279937231992E+01 -2.545130107046207E+01 -2.698817152454711E+01 -2.872169675183757E+01 -3.057043173554429E+01
125:  -3.251303585112277E+01 -3.401898856098701E+01 -3.507765009018522E+01 -3.636698262833097E+01 -3.738453161991322E+01
125:  -3.809869085425957E+01 -3.854711819313477E+01 -3.873442968702064E+01 -3.871381410527610E+01 -3.868642463151998E+01
125:  -3.855680002356787E+01 -3.829985424738636E+01 -3.795420629125690E+01 -3.759969685970234E+01 -3.687799696433969E+01
125:  -3.570825211706961E+01 -3.465195573539580E+01 -3.404046492039783E+01 -3.361870930459802E+01 -3.266910079767725E+01
125:  -3.144409278551829E+01 -3.027688058728471E+01 -2.904041504188306E+01 -2.752482920377842E+01 -2.571022744663582E+01
125:  -2.343996876001918E+01 -2.083077148844477E+01 -1.753568990147992E+01 -1.341224900185656E+01 -8.879234248221550E+00
125:  -4.052461373571498E+00  1.397417164026115E+00  5.835776644422274E+00  7.582154570497681E+00  7.554847477114204E+00
125:   7.709805089467414E+00  7.845328583559993E+00

Atmosphere_AD computed and printed directly by the test program:

125:    Layer absorber:
125:      H2O(Mass mixing ratio, g/kg)
125:   6.142456766251701E+01  5.304477818767415E+01  3.689097390957784E+01  2.325195646636487E+01  8.495578392047705E+00
125:  -6.797006542146675E+00 -1.629009301054668E+01 -1.662297945539264E+01 -5.166830311274156E+00  1.639353111088985E+01
125:   3.627302313607766E+01  4.623796795491968E+01  5.366620569388418E+01  6.753680033628423E+01  7.946669424786768E+01
125:   8.257805024098158E+01  7.577311580822129E+01  6.432901810878136E+01  5.228491785391348E+01  3.581944584325409E+01
125:   1.211700888591228E+01 -5.479093695450344E+00 -1.524797497617080E+01 -2.768588876717426E+01 -4.329695872261105E+01
125:  -5.600106137656109E+01 -6.264934955106590E+01 -6.562482985365862E+01 -6.271270155942371E+01 -5.410454547965717E+01
125:  -3.467292398809365E+01 -7.734618913726947E-01  3.323948249295362E+01  5.178255089312876E+01  7.374615414085142E+01
125:   1.220458717195147E+02  2.090357742813229E+02  2.312280860370082E+02  2.973091033800375E+02  3.152943787224053E+02
125:   3.147997327695679E+02  3.149255202376236E+02  2.957412693436690E+02  2.737065305742941E+02  1.606428170859161E+02
125:   9.037220187491141E+01  4.379028197808394E+00 -5.472347564902283E+01 -8.386606767529400E+01 -1.112940509607798E+02
125:  -1.326980168660481E+02 -1.539631037428189E+02 -1.737056976336570E+02 -1.939893221447279E+02 -2.085987436204250E+02
125:  -2.205271562743123E+02 -2.289129643280302E+02 -2.301014840518982E+02 -2.177446595354391E+02 -1.976523158967921E+02
125:  -1.748459226093751E+02 -1.517642541123914E+02 -1.310586486449482E+02 -1.139518417218672E+02 -9.927330431003635E+01
125:  -8.714110734191345E+01 -7.613868707807178E+01 -6.606658642659447E+01 -5.707396527760154E+01 -4.927090687137959E+01
125:  -4.273578450525807E+01 -3.735463690391561E+01 -3.382496779939049E+01 -3.162054012886624E+01 -2.924134480625098E+01
125:  -2.677190017066148E+01 -2.447189408199796E+01 -2.254887061807074E+01 -2.042770129267942E+01 -1.803371429734108E+01
125:  -1.560450366767567E+01 -1.337282517000781E+01 -1.109580344817812E+01 -8.645801744618065E+00 -6.094917510759959E+00
125:  -3.280776803351705E+00 -2.645287138480897E-01  1.960860876330768E+00  2.723520061595587E+00  2.780488436520078E+00
125:   2.840862706013108E+00  2.900170928171081E+00
125:      O3(Volume mixing ratio, ppmv)
125:   4.877826584575503E-01  4.399638943049464E-01  2.654443874749655E-01  6.648765599005857E-02 -1.411176804786126E-01
125:  -3.079475487554525E-01 -4.195958890604522E-01 -4.596945555374102E-01 -3.838906116620937E-01 -2.248562164856887E-01
125:  -9.838982702682184E-02 -8.998144867739699E-02 -1.247681220115979E-01 -1.166479101714847E-01 -1.425098171711824E-01
125:  -2.548437548606552E-01 -4.516865049879745E-01 -6.790246409957772E-01 -9.289423251771439E-01 -1.243098837697085E+00
125:  -1.636010488351747E+00 -2.001836710800018E+00 -2.334868243092344E+00 -2.710360977624231E+00 -3.138693769610450E+00
125:  -3.565249751892146E+00 -3.962846683819050E+00 -4.355304452415558E+00 -4.730528642960445E+00 -5.095999058211158E+00
125:  -5.417373119332703E+00 -5.666122530343816E+00 -5.929490512797569E+00 -6.368003807437389E+00 -6.842816263925140E+00
125:  -7.071800983161490E+00 -7.101961603837075E+00 -7.328525179613559E+00 -7.923848454792981E+00 -8.785370404722251E+00
125:  -9.810400466821124E+00 -1.085893793361037E+01 -1.209365857429626E+01 -1.356429973276975E+01 -1.513127786739245E+01
125:  -1.685961795107050E+01 -1.872212397069848E+01 -2.033963533120699E+01 -2.168891253323039E+01 -2.293045899536332E+01
125:  -2.415279937231992E+01 -2.545130107046207E+01 -2.698817152454711E+01 -2.872169675183757E+01 -3.057043173554429E+01
125:  -3.251303585112277E+01 -3.401898856098701E+01 -3.507765009018522E+01 -3.636698262833097E+01 -3.738453161991322E+01
125:  -3.809869085425957E+01 -3.854711819313477E+01 -3.873442968702064E+01 -3.871381410527610E+01 -3.868642463151998E+01
125:  -3.855680002356787E+01 -3.829985424738636E+01 -3.795420629125690E+01 -3.759969685970234E+01 -3.687799696433969E+01
125:  -3.570825211706961E+01 -3.465195573539580E+01 -3.404046492039783E+01 -3.361870930459802E+01 -3.266910079767725E+01
125:  -3.144409278551829E+01 -3.027688058728471E+01 -2.904041504188306E+01 -2.752482920377842E+01 -2.571022744663582E+01
125:  -2.343996876001918E+01 -2.083077148844477E+01 -1.753568990147992E+01 -1.341224900185656E+01 -8.879234248221550E+00
125:  -4.052461373571498E+00  1.397417164026115E+00  5.835776644422274E+00  7.582154570497681E+00  7.554847477114204E+00
125:   7.709805089467414E+00  7.845328583559993E+00
125:    Layer cloud fraction:
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
125:   0.000000000000000E+00  0.000000000000000E+00
chengdang commented 1 year ago

I'm not sure if this is due to my computing environment, or this is true everywhere. Could you double check if this is the case on your laptop? @BenjaminTJohnson @StegmannJCSDA Thank you!

BenjaminTJohnson commented 1 year ago

@chengdang will do

BenjaminTJohnson commented 1 year ago

The following tests fail on my laptop:

    120 - test_adjoint_Simple_atms_n21 (Failed)
    121 - test_adjoint_Simple_cris-fsr_n21 (Failed)
    122 - test_adjoint_Simple_v.abi_g18 (Failed)
    123 - test_adjoint_Simple_atms_npp (Failed)
    124 - test_adjoint_Simple_cris399_npp (Failed)
    125 - test_adjoint_Simple_v.abi_gr (Failed)
    126 - test_adjoint_Simple_abi_g18 (Failed)
    127 - test_adjoint_Simple_modis_aqua (Failed)

for the test_adjoint_Simple_cris399_npp test, I get:

24:  -3.820655619696756-190 -3.664890439936774-216 -3.466042165557551-243 -4.534950198875833-271 -5.064972389177091-299
124:   0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00  0.000000000000000E+00
124:   0.000000000000000E+00  0.000000000000000E+00
124: 
124:      Comparing calculated results with saved ones...
124:  test_Simple(FAILURE) : Atmosphere_AD Adjoints are different!
124: Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
124: STOP 1
chengdang commented 1 year ago

Thank you Ben! Do you also see the same values being placed under difference variable names between atm_AD and Atmosphere_AD?

BenjaminTJohnson commented 1 year ago

@chengdang I'm unable to use CRTM_Atmosphere_Inspect on atm_AD... it just prints the following:

124:
124:      Comparing calculated results with saved ones...
124:  test_Simple(INFORMATION) : Atmosphere_AD save file does not exist. Creating...
124:  test_Simple(INFORMATION) : Surface_AD save file does not exist. Creating...
124:  ATMOSPHERE OBJECT
124:    n_Layers    : 0
124:    n_Absorbers : 0
124:    n_Clouds    : 0
124:    n_Aerosols  : 0
124:    Climatology : Invalid

Regardless, there's an issue with array memory locations here. It has the right number of elements (92). I'll dig into this a bit more.

BenjaminTJohnson commented 1 year ago

It's odd that the K_matrix output looks completely reasonable.

BenjaminTJohnson commented 1 year ago

Adding another bit of information:

74:      Comparing calculated results with saved ones...
74:  test_Simple(INFORMATION) : Atmosphere_AD save file does not exist. Creating...
74:  test_Simple(INFORMATION) : Surface_AD save file does not exist. Creating...
74:  test_Simple(FAILURE) : Atmosphere_AD Adjoints are different!

Even when it's creating the adjoint results and then comparing to those very same results, it has a difference.
Probably still underflow and bit representation issue.

BenjaminTJohnson commented 1 year ago
--- a/src/Atmosphere/CRTM_Atmosphere_Define.f90
+++ b/src/Atmosphere/CRTM_Atmosphere_Define.f90
@@ -2452,7 +2452,7 @@ CONTAINS
       atm%Level_Pressure, &
       atm%Pressure, &
       atm%Temperature, &
-      !atm%Relative_Humidity, &   ! RH APPROACH #1
+      atm%Relative_Humidity, &   ! RH APPROACH #1
       atm%Absorber, &
       atm%Cloud_Fraction
     IF ( io_stat /= 0 ) THEN
@@ -2460,9 +2460,9 @@ CONTAINS
       CALL Read_Record_Cleanup(); RETURN
     END IF

-    ! RH APPROACH #2
-    ! Compute the relative humidity
-    CALL Compute_Relative_Humidity( atm )
+!!$    ! RH APPROACH #2
+!!$    ! Compute the relative humidity
+!!$    CALL Compute_Relative_Humidity( atm )

this appears to fix the adjoint bug. I will work on a separate PR for this, as it was done in combination with disabling openMP.

BenjaminTJohnson commented 1 year ago

This only fixes the adjoint issue for non-openMP CRTM, I will now figure out what's going on with openMP, it will probably take a day or two.

BenjaminTJohnson commented 1 year ago

Solution to open MP issues is to pass Opt as an input parameter to the subroutine Post_Process_RTSolution, as we did in v2.4.1. This is required for CRTM_Forward_Module, CRTM_Tangent_Linear_Module, and CRTM_Adjoint_Module, and CRTM_K_Matrix_Module.