JCSDA-internal / ioda-converters

Various converters for getting obs data in and out of IODA
9 stars 2 forks source link

Fix acprofile problem. #1421

Closed rmclaren closed 6 months ago

rmclaren commented 9 months ago

Description

Find fix for acprofile test problem

BenjaminRuston commented 9 months ago

should we go ahead and uncomment test_iodaconv_prepbufr_ncep_aircftprofiles2ioda ? should I push this up, we will get the CI testing output again then.

I'll go ahead and do this, @rmclaren please change as you see fit

rmclaren commented 9 months ago

@BenjaminRuston Fixed something. Could you download this branch and see if it fixes the problem on all the different machines?

rmclaren commented 9 months ago

nevermind

rmclaren commented 8 months ago

So recreated the whole environment on a VM with Spack-Stack (Ubuntu 20.04) and the test still passes. My configuration looks as follows:

source /home/rmclaren/spack-stack/setup.sh
spack env activate /home/rmclaren/spack-stack/envs/unified-env.mylinux

source /etc/profile.d/modules.sh

module use ${SPACK_STACK_DIR}/envs/unified-env.mylinux/install/modulefiles/Core

module load stack-gcc/9.4.0
module load stack-python/3.10.8
module load stack-mpich/4.1.2

module load ecflow
module load ewok-env
module load jedi-fv3-env
module load soca-env

The key thing here is I'm not loading anything that was not built as part of the stack.... Could anyone try in a failing environment?

PatNichols commented 8 months ago

@rmclaren I will give it a shot on my Mac. I have several versions of the spack stack installed. I also can try on orion with intel.

PatNichols commented 8 months ago

@rmclaren I am seeing the same failure on ubuntu8 and redhat using the gcc version 12 compiler and spack-stack 1.5.1. Same output on both.

rmclaren commented 8 months ago

What does your module config look like for ubuntu. I set up a fresh vm of 20.04, loaded the spack stack, but have absolutely no problems running that test... (gcc 9.4.0 in my case)

rmclaren commented 8 months ago

@PatNichols what happens if you cd <your ioda-converter dir>/build/test and run the following valgrind --tool=memcheck --gen-suppressions=all --leak-check=full --leak-resolution=med --track-origins=yes "../bin/bufr2ioda.x testinput/bufr_ncep_prepbufr_aircft.yaml"? You might need to install valgrind first...

PatNichols commented 8 months ago

valgrind: ./bin.bufr2ioda.x: No such file or directory ec2-user bld2$ valgrind --leak-check=full -s ./bin/bufr2ioda.x bufr_ncep_prepbufr_aircft.yaml ==34049== Memcheck, a memory error detector ==34049== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==34049== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info ==34049== Command: ./bin/bufr2ioda.x bufr_ncep_prepbufraircft.yaml ==34049== ==34049== Warning: set address range perms: large range [0x1d1ad040, 0x328ffa40) (undefined) ==34049== Warning: set address range perms: large range [0x40795040, 0x575cb040) (undefined) vex amd64->IR: unhandled instruction bytes: 0xC4 0xE1 0xF9 0x90 0x10 0xB8 0x2 0x0 0x0 0x0 vex amd64->IR: REX=0 REX.W=1 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=1 VEX.L=0 VEX.nVVVV=0x0 ESC=0F vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0 ==34049== valgrind: Unrecognised instruction at address 0x82fc9d8. ==34049== at 0x82FC9D8: wrdlen (wrdlen.F:24) ==34049== by 0x82FBE4A: openbf_ (openbf.f:202) ==34049== by 0x82FF87E: openbf_f (bufr_c2f_interface.F90:127) ==34049== by 0x40DFA67: Ingester::bufr::NcepDataProvider::open() (NcepDataProvider.cpp:25) ==34049== by 0x40E3010: Ingester::bufr::File::File(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) (File.cpp:35) ==34049== by 0x40AE6B0: Ingester::BufrParser::BufrParser(eckit::LocalConfiguration const&) (BufrParser.cpp:37) ==34049== by 0x4059B0: construct<Ingester::BufrParser, const eckit::LocalConfiguration&> (new_allocator.h:162) ==34049== by 0x4059B0: construct<Ingester::BufrParser, const eckit::LocalConfiguration&> (alloc_traits.h:516) ==34049== by 0x4059B0: _Sp_counted_ptr_inplace<const eckit::LocalConfiguration&> (shared_ptr_base.h:519) ==34049== by 0x4059B0: shared_count<Ingester::BufrParser, std::allocator, const eckit::LocalConfiguration&> (shared_ptr_base.h:650) ==34049== by 0x4059B0: __shared_ptr<std::allocator, const eckit::LocalConfiguration&> (shared_ptr_base.h:1342) ==34049== by 0x4059B0: shared_ptr<std::allocator, const eckit::LocalConfiguration&> (shared_ptr.h:409) ==34049== by 0x4059B0: allocate_shared<Ingester::BufrParser, std::allocator, const eckit::LocalConfiguration&> (shared_ptr.h:863) ==34049== by 0x4059B0: make_shared<Ingester::BufrParser, const eckit::LocalConfiguration&> (shared_ptr.h:879) ==34049== by 0x4059B0: make (ObjectFactory.h:54) ==34049== by 0x4059B0: Ingester::ObjectFactory<Ingester::Parser, eckit::LocalConfiguration const&>::create(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, eckit::LocalConfiguration const&) (ObjectFactory.h:76) ==34049== by 0x403EDE: Ingester::parse(std::cxx11::basic_string<char, std::chartraits, std::allocator > const&, unsigned long) (bufr2ioda.cpp:46) ==34049== by 0x403797: main (bufr2ioda.cpp:111) ==34049== Your program just tried to execute an instruction that Valgrind ==34049== did not recognise. There are two possible reasons for this. ==34049== 1. Your program has a bug and erroneously jumped to a non-code ==34049== location. If you are running Memcheck and you just saw a ==34049== warning about a bad jump, it's probably your program's fault. ==34049== 2. The instruction is legitimate but Valgrind doesn't handle it, ==34049== i.e. it's Valgrind's fault. If you think this is the case or ==34049== you are not sure, please let us know and we'll try to fix it. ==34049== Either way, Valgrind will now raise a SIGILL signal which will ==34049== probably kill your program. ==34049== ==34049== Process terminating with default action of signal 4 (SIGILL) ==34049== Illegal opcode at address 0x82FC9D8 ==34049== at 0x82FC9D8: wrdlen (wrdlen.F:24) ==34049== by 0x82FBE4A: openbf_ (openbf.f:202) ==34049== by 0x82FF87E: openbf_f (bufr_c2f_interface.F90:127) ==34049== by 0x40DFA67: Ingester::bufr::NcepDataProvider::open() (NcepDataProvider.cpp:25) ==34049== by 0x40E3010: Ingester::bufr::File::File(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) (File.cpp:35) ==34049== by 0x40AE6B0: Ingester::BufrParser::BufrParser(eckit::LocalConfiguration const&) (BufrParser.cpp:37) ==34049== by 0x4059B0: construct<Ingester::BufrParser, const eckit::LocalConfiguration&> (new_allocator.h:162) ==34049== by 0x4059B0: construct<Ingester::BufrParser, const eckit::LocalConfiguration&> (alloc_traits.h:516) ==34049== by 0x4059B0: _Sp_counted_ptr_inplace<const eckit::LocalConfiguration&> (shared_ptr_base.h:519) ==34049== by 0x4059B0: __shared_count<Ingester::BufrParser, std::allocator, const eckit::LocalConfiguration&> (shared_ptr_base.h:650) ==34049== by 0x4059B0: shared_ptr<std::allocator, const eckit::LocalConfiguration&> (shared_ptr_base.h:1342) ==34049== by 0x4059B0: shared_ptr<std::allocator, const eckit::LocalConfiguration&> (shared_ptr.h:409) ==34049== by 0x4059B0: allocate_shared<Ingester::BufrParser, std::allocator, const eckit::LocalConfiguration&> (shared_ptr.h:863) ==34049== by 0x4059B0: make_shared<Ingester::BufrParser, const eckit::LocalConfiguration&> (shared_ptr.h:879) ==34049== by 0x4059B0: make (ObjectFactory.h:54) ==34049== by 0x4059B0: Ingester::ObjectFactory<Ingester::Parser, eckit::LocalConfiguration const&>::create(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, eckit::LocalConfiguration const&) (ObjectFactory.h:76) ==34049== by 0x403EDE: Ingester::parse(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, unsigned long) (bufr2ioda.cpp:46) ==34049== by 0x403797: main (bufr2ioda.cpp:111) ==34049== ==34049== HEAP SUMMARY: ==34049== in use at exit: 1,316,729,091 bytes in 4,167 blocks ==34049== total heap usage: 7,374 allocs, 3,207 frees, 1,316,958,589 bytes allocated ==34049== ==34049== LEAK SUMMARY: ==34049== definitely lost: 0 bytes in 0 blocks ==34049== indirectly lost: 0 bytes in 0 blocks ==34049== possibly lost: 0 bytes in 0 blocks ==34049== still reachable: 1,316,729,091 bytes in 4,167 blocks ==34049== suppressed: 0 bytes in 0 blocks ==34049== Reachable blocks (those to which a pointer was found) are not shown. ==34049== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==34049== ==34049== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

PatNichols commented 8 months ago

@rmclaren Not sure how well valgrind handles fortran though.

rmclaren commented 8 months ago

@PatNichols Thanks!

PatNichols commented 7 months ago

@rmclaren Any progress on this. After some testing it looks like valgrinds output is misleading. The common block is triggering an error when none exists.

rmclaren commented 7 months ago

Sorry, trying to finish something up and then circle back to this.

rmclaren commented 6 months ago

Little of an update. I finally got them to give me an account on ORION and the branch feature/query_acprofile_prob dooes not reproduce the problem on those machines (the develop branch will have a different issue because something got commented out in the YAML file just FYI).

PatNichols commented 6 months ago

@rmclaren I checked the output of ioda_validate and at least for pressure there are problems. This may be problem with the input file not the code itself.

.... Reading YAML from /build_container/ioda/share/test/testinput/validation/ObsSpace.yaml Processing data file: /jcsda/ioda-bundle/iodaconv/test/testoutput/gdas.t12z.acft_profiles.prepbufr.nc

.... Variable QualityMarker/pressure Warning: Variable 'QualityMarker/pressure' does not have match any of the recommended dimensions. Variable dimensions: [ Location dim_2 PressureEvent ]. Recommended dimensions: [ Location ] [ Location Level ] [ Location Level PressureEvent ] Variable QualityMarker/airTemperature Warning: Variable 'QualityMarker/airTemperature' does not have match any of the recommended dimensions. Variable dimensions: [ Location dim_2 TemperatureEvent ]. Recommended dimensions: [ Location ] [ Location Level ] [ Location Level TemperatureEvent ] [ Location AmdarSequence ] [ Location dim_2 ] Variable QualityMarker/specificHumidity Warning: Variable 'QualityMarker/specificHumidity' does not have match any of the recommended dimensions. Variable dimensions: [ Location dim_2 HumidityEvent ]. Recommended dimensions: [ Location ] [ Location Level ] [ Location Level HumidityEvent ] Variable QualityMarker/windEastward Warning: Variable 'QualityMarker/windEastward' does not have match any of the recommended dimensions. Variable dimensions: [ Location dim_2 WindEvent ]. Recommended dimensions: [ Location ] [ Location Level ] [ Location Level WindEvent ] Variable QualityMarker/height Warning: Variable 'QualityMarker/height' does not have match any of the recommended dimensions. Variable dimensions: [ Location dim_2 HeightEvent ]. Recommended dimensions: [ Location ] [ Location Level ] [ Location Level HeightEvent ] Variable QualityMarker/windNorthward Warning: Variable 'QualityMarker/windNorthward' does not have match any of the recommended dimensions. Variable dimensions: [ Location dim_2 WindEvent ]. Recommended dimensions: [ Location ] [ Location Level ] [ Location Level WindEvent ] Variable ObsError/pressure Warning: Variable 'ObsError/pressure' does not have match any of the recommended dimensions. Variable dimensions: [ Location dim_2 ]. Recommended dimensions: [ Location ] [ Location Level ] [ Location Level PressureEvent ] Variable ObsError/airTemperature Variable ObsError/relativeHumidity ...

rmclaren commented 6 months ago

@PatNichols Your comment is unrelated to the issue in this thread. It just means that the input YAML file isn't following those conventions...

rmclaren commented 6 months ago

I'm going to close this Pull Review and open a new one for feature/query_acprofile_prob_2 in order to merge with a cleaner history. Please see https://github.com/JCSDA-internal/ioda-converters/pull/1467.