Open prjemian opened 5 years ago
A robust defense would be to convert any unacceptable characters. Something such as this regexp: [A-Za-z_][\w_]*
(includes upper or lower case letters, digits, underscore - but not starting with a digit).
This is the end of the Exception trace:
~/Apps/BlueSky/lib/python3.6/site-packages/databroker/headersource/sqlite.py in _insert_descriptor(self, doc)
264 with cursor(self._runstarts[run_start_uid]) as c:
265 c.execute(CREATE_TABLE % table_name
--> 266 + '(' + ','.join(columns) + ')')
267 self._descriptors[uid] = run_start_uid
268
OperationalError: duplicate column name: data_NFS
Ah, this is a limtitation the sqlite that is getting us here?
We have a limitation on .
for mongo-related reasons.
This is the code to pick safe names but we need to catch the descriptors and the events, right?
import re
ACCEPTABLE_PATTERN = r"[A-Za-z_][\w_]*"
def make_safe(keys):
def safe(text):
keep_indices = []
for m in re.finditer(ACCEPTABLE_PATTERN, text):
keep_indices += range(m.start(), m.end())
def rewrite(p):
return {True: text[p], False: "_"}[p in keep_indices]
return "".join(map(rewrite, range(len(text))))
safe_keys = list(map(safe, sorted(keys)))
return safe_keys
def tester():
import pyRestTable
tbl = pyRestTable.Table()
tbl.addLabel("old")
tbl.addLabel("new")
keys = ['NFS, delayed', 'NFS, total', 'NRIXS dlyd-sum', 'NRIXS tot-sum',
'NRIXS, delayed1', 'NRIXS, delayed2', 'NRIXS, delayed3', 'Time',
'mca_elapsed_real_time', 'mca_preset_real_time', 'mca_spectrum',
'neat_stage_x', 'neat_stage_x_user_setpoint', 'neat_stage_y',
'neat_stage_y_user_setpoint', 'scaler_time']
safe_keys = make_safe(keys)
for i, new_key in enumerate(safe_keys):
old_key = keys[i]
tbl.addRow((old_key, new_key))
print(tbl)
Has this output:
old | new |
---|---|
NFS, delayed | NFS__delayed |
NFS, total | NFS__total |
NRIXS dlyd-sum | NRIXS_dlyd_sum |
NRIXS tot-sum | NRIXS_tot_sum |
NRIXS, delayed1 | NRIXS__delayed1 |
NRIXS, delayed2 | NRIXS__delayed2 |
NRIXS, delayed3 | NRIXS__delayed3 |
Time | Time |
mca_elapsed_real_time | mca_elapsed_real_time |
mca_preset_real_time | mca_preset_real_time |
mca_spectrum | mca_spectrum |
neat_stage_x | neat_stage_x |
neat_stage_x_user_setpoint | neat_stage_x_user_setpoint |
neat_stage_y | neat_stage_y |
neat_stage_y_user_setpoint | neat_stage_y_user_setpoint |
scaler_time | scaler_time |
This is the code to pick safe names but we need to catch the descriptors and the events, right?
We already check that the keys in Events match those in an EventDescriptor (and we mint a new EventDescriptor as needed). Therefore, validating the data_keys in the EventDescriptor is sufficient to effectively validate both.
On a bit more consideration, @prjemian exercised an sql injection attack....
... using content from databroker
Using the
ophyd.scaler.ScalerCH
, which configures data keys according to the text a user has given a scaler channel in EPICS, it is possible to have data keys with unacceptable values, such as from this list:Since these data keys already exist in the databroker, we have to fix the code that imports them. Presently,
databroker.headersource.sqlite.EventCollection.columns
has this code with inadequate defense:safekeys = [key.replace('-', '') for key in sorted_keys]