[bug] Invalid DB query - Githubissues

jsturdy commented 5 years ago

Brief summary of issue

The current DB query selects only VFATs with RUN_NUMBER set to the maximum value in the table, rather than selecting from all VFATs, only the record with the highest RUN_NUMBER This results in missing VFATs when running any scripts, sometimes working, but usually only by luck.

Types of issue

[x] Bug report (report an issue with the code)

Expected Behavior

Any query should select uniquely one VFAT, and if multiple records exist for the same VFAT, the one with the highest RUN_NUMBER will be returned

Possible Solution (for bugs)

Changing the query to something like:

## only necessary if you want to programatically get all column names, if you only use a well defined subset, they can be manually specified, and avoid the extra query
colquery="select COLUMN_NAME from ALL_TAB_COLUMNS where TABLE_NAME='GEM_VFAT3_PROD_SUMMARY_V_RH'"
cur.execute(colquery)
columns = ",".join([ r[0] for r in cur])

query = "select * from (select {:s},max(RUN_NUMBER) over (partition by VFAT3_BARCODE) as LATEST_RUN_NUMBER from CMS_GEM_MUON_VIEW.GEM_VFAT3_PROD_SUMMARY_V_RH) data where RUN_NUMBER = LATEST_RUN_NUMBER and ({:s})".format(columns,vfatQuery)

Context (for feature requests)

Tested with python interpreter and with sql_developer

lpetre-ulb commented 5 years ago

What about using an inner join in a single query ?

query=('SELECT data.* FROM CMS_GEM_MUON_VIEW.{0} data'
'INNER JOIN (SELECT vfat3_barcode, MAX(run_number) AS run_number FROM CMS_GEM_MUON_VIEW.{0} GROUP BY vfat3_barcode) data_select'
'ON data.vfat3_barcode = data_select.vfat3_barcode AND data.run_number = data_select.run_number').format(view)

It is working with pandas inside the python interpreter.

[Works with the assumption that the pair (vfat3_barcode, run_number) is unique.]

jsturdy commented 5 years ago

What about using an inner join in a single query ?
query=('SELECT data.* FROM CMS_GEM_MUON_VIEW.{0} data'
'INNER JOIN (SELECT vfat3_barcode, MAX(run_number) AS run_number FROM CMS_GEM_MUON_VIEW.{0} GROUP BY vfat3_barcode) data_select'
'ON data.vfat3_barcode = data_select.vfat3_barcode AND data.run_number = data_select.run_number').format(view)
It is working with pandas inside the python interpreter.

[Works with the assumption that the pair (vfat3_barcode, run_number) is unique.]

No preference, your solution seems a bit faster (0.02s vs 0.08s in sql_developer with a restriction to some 10 specific VFATs, and roughly equal times with no restriction), and is a bit cleaner

bdorney commented 5 years ago

```python
query=('SELECT data.* FROM CMS_GEM_MUON_VIEW.{0} data'
'INNER JOIN (SELECT vfat3_barcode, MAX(run_number) AS run_number FROM CMS_GEM_MUON_VIEW.{0} GROUP BY vfat3_barcode) data_select'
'ON data.vfat3_barcode = data_select.vfat3_barcode AND data.run_number = data_select.run_number').format(view)



> What about using an inner join in a single query ?
> ```python
> query=('SELECT data.* FROM CMS_GEM_MUON_VIEW.{0} data'
> 'INNER JOIN (SELECT vfat3_barcode, MAX(run_number) AS run_number FROM CMS_GEM_MUON_VIEW.{0} GROUP BY vfat3_barcode) data_select'
> 'ON data.vfat3_barcode = data_select.vfat3_barcode AND data.run_number = data_select.run_number').format(view)
> ```
> 
> 
> It is working with `pandas` inside the `python` interpreter.
> [Works with the assumption that the pair `(vfat3_barcode, run_number)` is unique.]

No preference, your solution seems a bit faster (0.02s vs 0.08s in `sql_developer` with a restriction to some 10 specific VFATs, and roughly equal times with no restriction), and is a bit cleaner

So using @lpetre-ulb's query (needed to add two spaces at end of lines) we have a really fast query with the full DB:

In [2]: getGEMDBView("GEM_VFAT3_PROD_SUMMARY_V_RH",debug=True)
query = SELECT data.* FROM CMS_GEM_MUON_VIEW.GEM_VFAT3_PROD_SUMMARY_V_RH data INNER JOIN (SELECT vfat3_barcode, MAX(run_number) AS run_number FROM CMS_GEM_MUON_VIEW.GEM_VFAT3_PROD_SUMMARY_V_RH GROUP BY vfat3_barcode) data_select ON data.vfat3_barcode = data_select.vfat3_barcode AND data.run_number = data_select.run_number
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1993 entries, 0 to 1992
Data columns (total 33 columns):
vfat3_ser_num        1993 non-null object
vfat3_barcode        1993 non-null object
hw_id_version        1863 non-null float64
buffer_offset        1883 non-null float64
vref_adc             1883 non-null float64
v_bgr                1883 non-null float64
adc0m                1940 non-null float64
adc0b                1940 non-null float64
adc1m                1940 non-null float64
adc1b                1940 non-null float64
cal_dacm             1924 non-null float64
cal_dacb             1924 non-null float64
iref                 1913 non-null float64
mean_thrshld         1865 non-null float64
mean_enc             1865 non-null float64
register_test        1883 non-null float64
ec_errors            1871 non-null float64
bc_errors            1871 non-null float64
crc_errors           1871 non-null float64
hit_errors           1871 non-null float64
noisy_channels       1865 non-null float64
dead_channels        1865 non-null float64
bist                 1870 non-null float64
scan_chain           779 non-null float64
sleep_pwr_analog     1884 non-null float64
sleep_pwr_digital    1884 non-null float64
run_pwr_analog       1867 non-null float64
run_pwr_digital      1867 non-null float64
location             1350 non-null object
temperature          1792 non-null object
state                1872 non-null object
temperature_k2       899 non-null float64
run_number           1993 non-null int64
dtypes: float64(27), int64(1), object(5)
memory usage: 513.9+ KB
Read 1993 rows from view GEM_VFAT3_PROD_SUMMARY_V_RH

However it seems like only columns vfat3_ser_num, vfat3_barcode and run_number are returned for every row. But this could be that the data was just never entered.

     vfat3_ser_num vfat3_barcode  hw_id_version  buffer_offset  vref_adc    v_bgr    adc0m    adc0b  ...  sleep_pwr_digital  run_pwr_analog  run_pwr_digital      location  temperature   state  temperature_k2  run_number
1973        0x1b76          7030       196609.0         1.3125       3.0  342.500  1.91431 -323.352  ...            83.1294        227.5670         103.3640  CERN 14-5-28      509.374  yellow       -105.7990          26
1974        0x1b77          7031       196609.0         1.2500       3.0  337.938  1.90202 -313.542  ...            86.5780        211.5550         106.3720  CERN 14-5-28      473.895  yellow        -99.1382          26
1975        0x1b78          7032       196609.0         3.6875       3.0  346.375  1.93318 -327.015  ...            79.2409        199.6960         103.4160  CERN 14-5-28       465.59   green        -96.6469          26
1976          0x47            71            NaN            NaN       NaN      NaN      NaN      NaN  ...                NaN             NaN              NaN          None         None     red             NaN          26
1977          0x48            72       196609.0         1.1250       3.0  344.188  1.92559 -322.946  ...            66.2382        277.5890          84.1675          None         None    None             NaN          26
1978          0x49            73       196609.0         2.1250       3.0  342.500  1.91736 -316.938  ...            66.8892        301.4830          83.9739          None         None    None             NaN          26
1979          0x4a            74       196609.0         2.4375       3.0  332.750  1.87028 -321.114  ...            67.3114        221.0040          84.0091  CERN 14-5-28      499.938   green       -106.5100          26
1980          0x4b            75       196609.0         2.9375       3.0  345.812  1.94609 -329.394  ...            66.6604        218.9450          84.7129          None      445.152   green        -92.6543          26
1981           0x8             8            NaN            NaN       NaN      NaN  2.01358 -301.412  ...                NaN             NaN              NaN          None         None   green             NaN          26
1982        0x22b8          8888       196609.0         1.8750       3.0  346.938  1.95359 -314.997  ...            80.3142        181.9600         100.0380  CERN 14-5-28      446.901   green        -91.5159          26
1983           0x9             9            NaN            NaN       NaN      NaN  2.03480 -329.537  ...                NaN             NaN              NaN          None         None   green             NaN          26
1984          0x5a            90            NaN            NaN       NaN      NaN      NaN      NaN  ...                NaN             NaN              NaN          None         None    None             NaN          26
1985          0x5b            91            NaN            NaN       NaN      NaN      NaN      NaN  ...                NaN             NaN              NaN          None         None    None             NaN          26

Gonna try the same with @jsturdy's query.

bdorney commented 5 years ago

I had to adapt @jsturdy's example slightly but the gist is:

    dbName = os.getenv("GEM_ONLINE_DB_NAME")
    dbConn = os.getenv("GEM_ONLINE_DB_CONN")
    colquery="select COLUMN_NAME from ALL_TAB_COLUMNS where TABLE_NAME='{}'".format(view)
    listOfColumns = pd.read_sql(colquery,con=(dbConn+dbName))
    columns = ",".join([ str(colName) for colName in listOfColumns.column_name ])

    query = "select * from (select {:s},max(RUN_NUMBER) over (partition by VFAT3_BARCODE) as LATEST_RUN_NUMBER from CMS_GEM_MUON_VIEW.{:s}) data where RUN_NUMBER = LATEST_RUN_NUMBER".format(columns,view)

    if vfatList is not None:
        query += getVFATFilter(vfatList)
        pass

    if debug:
        print("query = {0}{1}{2}".format(colors.YELLOW,query,colors.ENDC))
        pass
    dfGEMView = pd.read_sql(query, con=(dbConn+dbName))

And this comes up with:

In [31]: getGEMDBView("GEM_VFAT3_PROD_SUMMARY_V_RH",debug=True)
query = SELECT data.* FROM CMS_GEM_MUON_VIEW.GEM_VFAT3_PROD_SUMMARY_V_RH data INNER JOIN (SELECT vfat3_barcode, MAX(run_number) AS run_number FROM CMS_GEM_MUON_VIEW.GEM_VFAT3_PROD_SUMMARY_V_RH GROUP BY vfat3_barcode) data_select ON dat
a.vfat3_barcode = data_select.vfat3_barcode AND data.run_number = data_select.run_number
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1993 entries, 0 to 1992
Data columns (total 33 columns):
vfat3_ser_num        1993 non-null object
vfat3_barcode        1993 non-null object
hw_id_version        1863 non-null float64
buffer_offset        1883 non-null float64
vref_adc             1883 non-null float64
v_bgr                1883 non-null float64
adc0m                1940 non-null float64
adc0b                1940 non-null float64
adc1m                1940 non-null float64
adc1b                1940 non-null float64
cal_dacm             1924 non-null float64
cal_dacb             1924 non-null float64
iref                 1913 non-null float64
mean_thrshld         1865 non-null float64
mean_enc             1865 non-null float64
register_test        1883 non-null float64
ec_errors            1871 non-null float64
bc_errors            1871 non-null float64
crc_errors           1871 non-null float64
hit_errors           1871 non-null float64
noisy_channels       1865 non-null float64
dead_channels        1865 non-null float64
bist                 1870 non-null float64
scan_chain           779 non-null float64
sleep_pwr_analog     1884 non-null float64
sleep_pwr_digital    1884 non-null float64
run_pwr_analog       1867 non-null float64
run_pwr_digital      1867 non-null float64
location             1350 non-null object
temperature          1792 non-null object
state                1872 non-null object
temperature_k2       899 non-null float64
run_number           1993 non-null int64
dtypes: float64(27), int64(1), object(5)
memory usage: 513.9+ KB
Read 1993 rows from view GEM_VFAT3_PROD_SUMMARY_V_RH

And looking at this dump it's the same as above. The records 1973 - 1985 in this case are:

1973        0x1b76          7030       196609.0         1.3125       3.0  342.500  1.91431 -323.352  ...            83.1294        227.5670         103.3640  CERN 14-5-28      509.374  yellow       -105.7990          26
1974        0x1b77          7031       196609.0         1.2500       3.0  337.938  1.90202 -313.542  ...            86.5780        211.5550         106.3720  CERN 14-5-28      473.895  yellow        -99.1382          26
1975        0x1b78          7032       196609.0         3.6875       3.0  346.375  1.93318 -327.015  ...            79.2409        199.6960         103.4160  CERN 14-5-28       465.59   green        -96.6469          26
1976          0x47            71            NaN            NaN       NaN      NaN      NaN      NaN  ...                NaN             NaN              NaN          None         None     red             NaN          26
1977          0x48            72       196609.0         1.1250       3.0  344.188  1.92559 -322.946  ...            66.2382        277.5890          84.1675          None         None    None             NaN          26
1978          0x49            73       196609.0         2.1250       3.0  342.500  1.91736 -316.938  ...            66.8892        301.4830          83.9739          None         None    None             NaN          26
1979          0x4a            74       196609.0         2.4375       3.0  332.750  1.87028 -321.114  ...            67.3114        221.0040          84.0091  CERN 14-5-28      499.938   green       -106.5100          26
1980          0x4b            75       196609.0         2.9375       3.0  345.812  1.94609 -329.394  ...            66.6604        218.9450          84.7129          None      445.152   green        -92.6543          26
1981           0x8             8            NaN            NaN       NaN      NaN  2.01358 -301.412  ...                NaN             NaN              NaN          None         None   green             NaN          26
1982        0x22b8          8888       196609.0         1.8750       3.0  346.938  1.95359 -314.997  ...            80.3142        181.9600         100.0380  CERN 14-5-28      446.901   green        -91.5159          26
1983           0x9             9            NaN            NaN       NaN      NaN  2.03480 -329.537  ...                NaN             NaN              NaN          None         None   green             NaN          26
1984          0x5a            90            NaN            NaN       NaN      NaN      NaN      NaN  ...                NaN             NaN              NaN          None         None    None             NaN          26
1985          0x5b            91            NaN            NaN       NaN      NaN      NaN      NaN  ...                NaN             NaN              NaN          None         None    None             NaN          26

bdorney commented 5 years ago

So these two methods are indeed similar. But I agree that @lpetre-ulb's implementation is cleaner so I think I will prepare a PR with that.

bdorney commented 5 years ago

Closed by #230

cms-gem-daq-project / gem-plotting-tools

[bug] Invalid DB query #229

Brief summary of issue

Types of issue

Expected Behavior

Possible Solution (for bugs)

Context (for feature requests)