EGI-Federation / bdii

Berkeley Database Information Index
Apache License 2.0
8 stars 8 forks source link

BDII udpate & base64 encoded ldif entries #17

Closed vokac closed 4 years ago

vokac commented 5 years ago

bdii-update script is used to publish ldif in the BDII, but it doesn't play well with base64 encoded entries. Could you please fix parsing ldif input that looks like

dn: cn=something,o=glue
ObjectClass: SomeClass
Attribute:: PHRlc3Q+ 

In this case "Attribute" is already base64 encoded, but following code instead of decoding right value '<test>' it incorrectly parses its name by splitting line with first ":" and use ": PHRIc3Q+" as a value https://github.com/EGI-Foundation/bdii/blob/c53551649ea0ef26a39a159f86323a3cb63614a9/bin/bdii-update#L347-L360 For example DPM storage use '<unknown>' for VO name in case of missing mapping between internal group_id and VO. I don't really care too much this particular DPM attribute, but I still consider this bdii-update behavior a bug that should be fixed not to cause such surprising behavior for base64 encoded ldif input. https://ggus.eu/index.php?mode=ticket_info&ticket_id=142928

vokac commented 4 years ago

I don't think function needs_encoding covers LDIF grammar described in RFC2849, because only SAFE-STRING can be used without base64 encoding. I think following function would make me more happy

def needs_encoding(value):
    if not value:
        return False
    if value[0] in [" ", "<", ":"]:
        return True
    if "\n" in value or "\r" in value or "\0" in value:
        return True
    return any(ord(c) >= 128 for c in value)
enolfc commented 4 years ago

I have put RFC2849 spec into a regexp. According to comments the any() call is inefficient. This is the code now:

# From RFC2849
# SAFE-CHAR                = %x01-09 / %x0B-0C / %x0E-7F
#                           ; any value <= 127 decimal except NUL, LF,
#                           ; and CR
# SAFE-INIT-CHAR           = %x01-09 / %x0B-0C / %x0E-1F /
#                            %x21-39 / %x3B / %x3D-7F
#                            ; any value <= 127 except NUL, LF, CR,
#                            ; SPACE, colon (":", ASCII 58 decimal)
#                            ; and less-than ("<" , ASCII 60 decimal)
# SAFE-STRING              = [SAFE-INIT-CHAR *SAFE-CHAR]
safe_string = re.compile("^[\x01-\x09\x0B-\x0C\x0E-\x1F\x21-\x39\x3B\x3D-\x7F]"
                         "[\x01-\x09\x0B-\x0C\x0E-\x7F]*$")

def needs_encoding(value):
    return safe_string.search(value) is None
vokac commented 4 years ago

There is special case with value = "" which should return False, but according LDIF grammar empty base64 encoded string is also valid ... so even this code produce valid LDIF data.

enolfc commented 4 years ago

True, I have also covered that case now

vokac commented 4 years ago

I tested the updated code works even for base64 encoded attributes in source LDIF file.