bluesky / databroker

Unified API pulling data from multiple sources
https://blueskyproject.io/databroker
BSD 3-Clause "New" or "Revised" License
34 stars 46 forks source link

need better defense against import of unacceptable data_keys #376

Open prjemian opened 5 years ago

prjemian commented 5 years ago

Using the ophyd.scaler.ScalerCH, which configures data keys according to the text a user has given a scaler channel in EPICS, it is possible to have data keys with unacceptable values, such as from this list:

['NFS, delayed', 'NFS, total', 'NRIXS dlyd-sum', 'NRIXS tot-sum', 
 'NRIXS, delayed1', 'NRIXS, delayed2', 'NRIXS, delayed3', 'Time', 
 'mca_elapsed_real_time', 'mca_preset_real_time', 'mca_spectrum', 
 'neat_stage_x', 'neat_stage_x_user_setpoint', 'neat_stage_y', 
 'neat_stage_y_user_setpoint', 'scaler_time']

Since these data keys already exist in the databroker, we have to fix the code that imports them. Presently, databroker.headersource.sqlite.EventCollection.columns has this code with inadequate defense:

safekeys = [key.replace('-', '') for key in sorted_keys]

prjemian commented 5 years ago

A robust defense would be to convert any unacceptable characters. Something such as this regexp: [A-Za-z_][\w_]* (includes upper or lower case letters, digits, underscore - but not starting with a digit).

prjemian commented 5 years ago

This is the end of the Exception trace:

~/Apps/BlueSky/lib/python3.6/site-packages/databroker/headersource/sqlite.py in _insert_descriptor(self, doc)
    264         with cursor(self._runstarts[run_start_uid]) as c:
    265             c.execute(CREATE_TABLE % table_name
--> 266                       + '(' + ','.join(columns) + ')')
    267         self._descriptors[uid] = run_start_uid
    268 

OperationalError: duplicate column name: data_NFS
tacaswell commented 5 years ago

Ah, this is a limtitation the sqlite that is getting us here?

We have a limitation on . for mongo-related reasons.

prjemian commented 5 years ago

This is the code to pick safe names but we need to catch the descriptors and the events, right?


import re
ACCEPTABLE_PATTERN = r"[A-Za-z_][\w_]*"

def make_safe(keys):
    def safe(text):
        keep_indices = []
        for m in re.finditer(ACCEPTABLE_PATTERN, text):
            keep_indices += range(m.start(), m.end())
        def rewrite(p):
            return {True: text[p], False: "_"}[p in keep_indices]
        return "".join(map(rewrite, range(len(text))))
    safe_keys = list(map(safe, sorted(keys)))
    return safe_keys

def tester():
    import pyRestTable
    tbl = pyRestTable.Table()
    tbl.addLabel("old")
    tbl.addLabel("new")
    keys = ['NFS, delayed', 'NFS, total', 'NRIXS dlyd-sum', 'NRIXS tot-sum',
    'NRIXS, delayed1', 'NRIXS, delayed2', 'NRIXS, delayed3', 'Time',
    'mca_elapsed_real_time', 'mca_preset_real_time', 'mca_spectrum',
    'neat_stage_x', 'neat_stage_x_user_setpoint', 'neat_stage_y',
    'neat_stage_y_user_setpoint', 'scaler_time']
    safe_keys = make_safe(keys)
    for i, new_key in enumerate(safe_keys):
        old_key = keys[i]
        tbl.addRow((old_key, new_key))
    print(tbl)

Has this output:

old new
NFS, delayed NFS__delayed
NFS, total NFS__total
NRIXS dlyd-sum NRIXS_dlyd_sum
NRIXS tot-sum NRIXS_tot_sum
NRIXS, delayed1 NRIXS__delayed1
NRIXS, delayed2 NRIXS__delayed2
NRIXS, delayed3 NRIXS__delayed3
Time Time
mca_elapsed_real_time mca_elapsed_real_time
mca_preset_real_time mca_preset_real_time
mca_spectrum mca_spectrum
neat_stage_x neat_stage_x
neat_stage_x_user_setpoint neat_stage_x_user_setpoint
neat_stage_y neat_stage_y
neat_stage_y_user_setpoint neat_stage_y_user_setpoint
scaler_time scaler_time
danielballan commented 5 years ago

This is the code to pick safe names but we need to catch the descriptors and the events, right?

We already check that the keys in Events match those in an EventDescriptor (and we mint a new EventDescriptor as needed). Therefore, validating the data_keys in the EventDescriptor is sufficient to effectively validate both.

tacaswell commented 5 years ago

On a bit more consideration, @prjemian exercised an sql injection attack....

prjemian commented 5 years ago

... using content from databroker