Conversion with new P3+ files now working - Githubissues

PollyNET / polly2earlinetDB

Provide command line tool to convert the polly results to the new EARLINET database format.

GNU General Public License v3.0

4 stars 0 forks source link

Conversion with new P3+ files now working #25

Closed HolgerPollyNet closed 2 years ago

HolgerPollyNet commented 2 years ago

The following error occurs:

File "polly2scc.py", line 1220, in read_picasso_results temperature=pData['temperature'][refMask355] + 273.16)) File "<array_function__ internals>", line 6, in nanmean

obviously caused by function "sigma_rayleigh" as @ulysses78 found out.

We realized that in P3.+ some variables are now float instead of double, might this cause the problem and shall we change this back @ZPYin ?

ulysses78 commented 2 years ago

"We realized that in P3.+ some variables are now float instead of double, might this cause the problem and shall we change this back @ZPYin ?"

Here the differences are listed:

v2.0                                                                                |     v3.1

  <variable name="LR_aeronet_1064" shape="method" type="double">                    |     <variable name="LR_aeronet_1064" shape="method" type="float">
  <variable name="LR_aeronet_355" shape="method" type="double">                     |     <variable name="LR_aeronet_355" shape="method" type="float">
  <variable name="LR_aeronet_532" shape="method" type="double">                     |     <variable name="LR_aeronet_532" shape="method" type="float">
  <variable name="RH" shape="height" type="double">                                 |     <variable name="RH" shape="height" type="float">
  <variable name="WVMR" shape="height" type="double">                               |     <variable name="WVMR" shape="height" type="float">
  <variable name="aerBsc_RR_1064" shape="height" type="double">                     |     <variable name="aerBsc_RR_1064" shape="height" type="float">
  <variable name="aerBsc_aeronet_1064" shape="height" type="double">                |     <variable name="aerBsc_aeronet_1064" shape="height" type="float">
  <variable name="aerBsc_aeronet_355" shape="height" type="double">                 |     <variable name="aerBsc_aeronet_355" shape="height" type="float">
  <variable name="aerBsc_aeronet_532" shape="height" type="double">                 |     <variable name="aerBsc_aeronet_532" shape="height" type="float">
  <variable name="aerBsc_klett_1064" shape="height" type="double">                  |     <variable name="aerBsc_klett_1064" shape="height" type="float">
  <variable name="aerBsc_klett_355" shape="height" type="double">                   |     <variable name="aerBsc_klett_355" shape="height" type="float">
  <variable name="aerBsc_klett_532" shape="height" type="double">                   |     <variable name="aerBsc_klett_532" shape="height" type="float">
  <variable name="aerBsc_raman_1064" shape="height" type="double">                  |     <variable name="aerBsc_raman_1064" shape="height" type="float">
  <variable name="aerBsc_raman_355" shape="height" type="double">                   |     <variable name="aerBsc_raman_355" shape="height" type="float">
  <variable name="aerBsc_raman_532" shape="height" type="double">                   |     <variable name="aerBsc_raman_532" shape="height" type="float">
  <variable name="aerExt_RR_1064" shape="height" type="double">                     |     <variable name="aerExt_RR_1064" shape="height" type="float">
  <variable name="aerExt_raman_1064" shape="height" type="double">                  |     <variable name="aerExt_raman_1064" shape="height" type="float">
  <variable name="aerExt_raman_355" shape="height" type="double">                   |     <variable name="aerExt_raman_355" shape="height" type="float">
  <variable name="aerExt_raman_532" shape="height" type="double">                   |     <variable name="aerExt_raman_532" shape="height" type="float">
  <variable name="aerLR_RR_1064" shape="height" type="double">                      |     <variable name="aerLR_RR_1064" shape="height" type="float">
  <variable name="aerLR_raman_1064" shape="height" type="double">                   |     <variable name="aerLR_raman_1064" shape="height" type="float">
  <variable name="aerLR_raman_355" shape="height" type="double">                    |     <variable name="aerLR_raman_355" shape="height" type="float">
  <variable name="aerLR_raman_532" shape="height" type="double">                    |     <variable name="aerLR_raman_532" shape="height" type="float">
  <variable name="altitude" shape="method" type="double">                           |     <variable name="altitude" shape="method" type="float">
  <variable name="end_time" shape="method" type="double">                                 <variable name="end_time" shape="method" type="double">
  <variable name="height" shape="height" type="double">                             |     <variable name="height" shape="height" type="float">
  <variable name="latitude" shape="method" type="double">                           |     <variable name="latitude" shape="method" type="float">
  <variable name="longitude" shape="method" type="double">                          |     <variable name="longitude" shape="method" type="float">
  <variable name="parDepol_klett_355" shape="height" type="double">                 |     <variable name="parDepol_klett_355" shape="height" type="float">
  <variable name="parDepol_klett_532" shape="height" type="double">                 |     <variable name="parDepol_klett_532" shape="height" type="float">
  <variable name="parDepol_raman_355" shape="height" type="double">                 |     <variable name="parDepol_raman_355" shape="height" type="float">
  <variable name="parDepol_raman_532" shape="height" type="double">                 |     <variable name="parDepol_raman_532" shape="height" type="float">
  <variable name="pressure" shape="height" type="double">                           |     <variable name="pressure" shape="height" type="float">
  <variable name="reference_height_1064" shape="reference_height" type="double">    |     <variable name="reference_height_1064" shape="reference_height" type="float">
  <variable name="reference_height_355" shape="reference_height" type="double">     |     <variable name="reference_height_355" shape="reference_height" type="float">
  <variable name="reference_height_532" shape="reference_height" type="double">     |     <variable name="reference_height_532" shape="reference_height" type="float">
  <variable name="shots" shape="method" type="double">                              |     <variable name="shots" shape="method" type="short">
  <variable name="start_time" shape="method" type="double">                               <variable name="start_time" shape="method" type="double">
  <variable name="temperature" shape="height" type="double">                        |     <variable name="temperature" shape="height" type="float">
  <variable name="uncertainty_parDepol_klett_355" shape="height" type="double">     |     <variable name="uncertainty_parDepol_klett_355" shape="height" type="float">
  <variable name="uncertainty_parDepol_klett_532" shape="height" type="double">     |     <variable name="uncertainty_parDepol_klett_532" shape="height" type="float">
  <variable name="uncertainty_parDepol_raman_355" shape="height" type="double">     |     <variable name="uncertainty_parDepol_raman_355" shape="height" type="float">
  <variable name="uncertainty_parDepol_raman_532" shape="height" type="double">     |     <variable name="uncertainty_parDepol_raman_532" shape="height" type="float">
  <variable name="volDepol_klett_355" shape="height" type="double">                 |     <variable name="volDepol_klett_355" shape="height" type="float">
  <variable name="volDepol_klett_532" shape="height" type="double">                 |     <variable name="volDepol_klett_532" shape="height" type="float">
  <variable name="volDepol_raman_355" shape="height" type="double">                 |     <variable name="volDepol_raman_355" shape="height" type="float">
  <variable name="volDepol_raman_532" shape="height" type="double">                 |     <variable name="volDepol_raman_532" shape="height" type="float">
  <variable name="zenith_angle" shape="method" type="double">                       |     <variable name="zenith_angle" shape="method" type="float">

... and indeed, that caused the problem!

I just tested it for the generated profile-file in the picasso-proc-chain (lib/io/pollySaveProfiles.m) by setting the type of every variable to 'double' and reprocessed a level0 file to level1 file. Afterwards I executed the polly2scc script with this new level1-nc-file and no error occured!

ulysses78 commented 2 years ago

Here is what we found out so far: the following error will be printed, when trying to execute polly2scc.py with picasso_v3.1 nc-files:

  result = np.where(m, fa, umath.power(fa, fb)).view(basetype)
Traceback (most recent call last):
  File "polly2scc.py", line 2319, in <module>
    main()
  File "polly2scc.py", line 2314, in main
    args.range_lim_b, args.range_lim_e, args.camp_info, args.force)
  File "polly2scc.py", line 2185, in polly2scc
    dims, data, global_attris = p2e_convertor.read_data_file(task)
  File "polly2scc.py", line 307, in read_data_file
    self.__read_picasso_results(filename, **kwargs)
  File "polly2scc.py", line 1220, in __read_picasso_results
    temperature=pData['temperature'][refMask355] + 273.16))
  File "<__array_function__ internals>", line 6, in nanmean
  File "C:\Users\baars\.conda\envs\polly2earlinet_db\lib\site-packages\numpy\lib\nanfunctions.py", line 951, in nanmean
    avg = _divide_by_count(tot, cnt, out=out)
  File "C:\Users\baars\.conda\envs\polly2earlinet_db\lib\site-packages\numpy\lib\nanfunctions.py", line 213, in _divide_by_count
    return np.divide(a, b, out=a, casting='unsafe')
ValueError: output array is read-only

ulysses78 commented 2 years ago

Furthermore I found out, that in the function „beta_pi_rayleigh“, which is implemented from the molecular.rayleigh_scattering, returns an empty masked_array for the picasso_v3.1-files. If I compress this array to the masked range of this array by using 'masked_array.compressed()':

refBscMol355 = np.nanmean(
                    beta_pi_rayleigh(
                        355,
                        pressure=pData['pressure'][refMask355],
                        temperature=pData['temperature'][refMask355] + 273.16).compressed())

the polly2scc script will generate the earlinet-nc-file, but with an empty 'backscatter_calibration_value'.

ulysses78 commented 2 years ago

Afterwards I checked the function „sigma_rayleigh“ :

def sigma_rayleigh(wavelength, pressure=1013.25, temperature=288.15, C=385., rh=0.):
    ''' Calculates the Rayleigh-scattering cross section per molecule.

    Parameters
    ----------
    wavelength: float
       Wavelegnth [nm]
    pressure: float
       The atmospheric pressure [hPa]
    temperature: float
       The atmospheric temperature [K]   
    C: float
       CO2 concentration [ppmv].
    rh: float
       Relative humidity from 0 to 100 [%] 

    Returns
    --------
    sigma: float
       Rayleigh-scattering cross section [m2]
    '''
    p_e = rh_to_pressure(rh, temperature)

    # Just to be sure that the wavelength is a float
    wavelength = np.array(wavelength, dtype=float)

    # Calculate properties of standard air
    n = n_air(wavelength, pressure, temperature, C, rh)
    N = number_density_at_pt(pressure, temperature, rh, ideal=ASSUME_AIR_IDEAL)

    # Wavelength of radiation
    wl_m = wavelength * 10 ** -9  # from nm to m

    # King's correction factor
    f_k = kings_factor_atmosphere(wavelength, C=C, p_e=p_e, p_t=pressure)  # no units

    # first part of the equation
    f1 = (24. * np.pi ** 3) / (wl_m ** 4 * N ** 2)
    # second part of the equation
    f2 = (n ** 2 - 1.) ** 2 / (n ** 2 + 2.) ** 2

    # result
    sigma = f1 * f2 * f_k
    return sigma

I found out at which point the masked_array gets empty. This is when f1 will be calculated, when N gets squared! N will be calculated by: N = number_density_at_pt(pressure, temperature, rh, ideal=ASSUME_AIR_IDEAL) If setting 'ideal' (compressibility of air) to False, the masked_array gets filled, when setting it to True (=?ASSUME_AIR_IDEAL), it will be empty for the v_3.1-nc-files. For the v_2.0-nc-files the polly2scc-script succeeded, independend of setting 'ideal' to True or to False.

ulysses78 commented 2 years ago

@ZPYin ... do you have any clue what caused this problem in the function mentioned above?

ZPYin commented 2 years ago

Afterwards I checked the function „sigma_rayleigh“ :

def sigma_rayleigh(wavelength, pressure=1013.25, temperature=288.15, C=385., rh=0.):
    ''' Calculates the Rayleigh-scattering cross section per molecule.

    Parameters
    ----------
    wavelength: float
       Wavelegnth [nm]
    pressure: float
       The atmospheric pressure [hPa]
    temperature: float
       The atmospheric temperature [K]   
    C: float
       CO2 concentration [ppmv].
    rh: float
       Relative humidity from 0 to 100 [%] 

    Returns
    --------
    sigma: float
       Rayleigh-scattering cross section [m2]
    '''
    p_e = rh_to_pressure(rh, temperature)

    # Just to be sure that the wavelength is a float
    wavelength = np.array(wavelength, dtype=float)

    # Calculate properties of standard air
    n = n_air(wavelength, pressure, temperature, C, rh)
    N = number_density_at_pt(pressure, temperature, rh, ideal=ASSUME_AIR_IDEAL)

    # Wavelength of radiation
    wl_m = wavelength * 10 ** -9  # from nm to m

    # King's correction factor
    f_k = kings_factor_atmosphere(wavelength, C=C, p_e=p_e, p_t=pressure)  # no units

    # first part of the equation
    f1 = (24. * np.pi ** 3) / (wl_m ** 4 * N ** 2)
    # second part of the equation
    f2 = (n ** 2 - 1.) ** 2 / (n ** 2 + 2.) ** 2

    # result
    sigma = f1 * f2 * f_k
    return sigma

I found out at which point the masked_array gets empty. This is when f1 will be calculated, when N gets squared! N will be calculated by: N = number_density_at_pt(pressure, temperature, rh, ideal=ASSUME_AIR_IDEAL) If setting 'ideal' (compressibility of air) to False, the masked_array gets filled, when setting it to True (=?ASSUME_AIR_IDEAL), it will be empty for the v_3.1-nc-files. For the v_2.0-nc-files the polly2scc-script succeeded, independend of setting 'ideal' to True or to False.

You are right!!! @ulysses78

Aftering switching to float in Picass v3.x, the variables were saved as numpy.float32 in the python conversion script, which were inherited from the nc file. However, it can cause overflow in calculating Rayleigh scattering properties. The strange behavior pointed out by @ulysses78 is exactly where overflow happens.

The value of atmospheric molecule number density (N) is (usually) at the order of magnitude of 25, while numpy.float32 covers the range of [-3.410^(-38), 3.410^38]. It causes error when executing N**2.

I fixed this bug in the lastest commit. You can check the details inside.

HolgerPollyNet commented 2 years ago

@ZPYin and @ulysses78

Guys, i just wanted to say that you doing a great job. And fixing such nasty bug is really not easy! Many thanks!