SuffolkLITLab / ALKiln

Integrated automated end-to-end testing with docassemble, puppeteer, and cucumber.
https://assemblyline.suffolklitlab.org/docs/alkiln/intro
MIT License
14 stars 4 forks source link

Figure out when a variable name has been fully decoded #575

Closed plocket closed 1 month ago

plocket commented 2 years ago

[This is a bit different than #574 in that it's supposed to reduce our 'guesses' at variable names to one guess and therefore remove the necessity of digging through guesses.]

In getPossibleVarNames, we don't intelligently decode anything. We decode a few times and hope that one of those times is the actual variable name. We keep all those as "guesses" and then later try out all the guesses to see if one matches something on the page.

If we could detect when we've gone too far, we could be sure we found the right var name. This would mean detecting that a decoded name is no longer a valid variable name and. That is, when a valid variable name is decoded too many times, the base64 decoding spits out gibberish that is, generally speaking, not a valid variable name.

Detecting that is pretty complex, though it can be a complex one-liner. It feels like there must be a library out there. Remember that I think we can manage to exclude the brackets that appear in objects ([, ]). If not, I think we can account for them.

This would really help with random input testing. It would help us give feedback to developers about what variables were set to what values on the page.

This would also help with #574.

BryceStevenWilley commented 2 years ago

Detecting that is pretty complex, though it can be a complex one-liner. It feels like there must be a library out there

The linked answer there is for detecting valid javascript identifiers, but we would be working with python identifiers, ~which are just the following regex~ which docassemble and ALWeaver restrict to:

/^[A-Za-z_][A-Za-z0-9_]*$/

(plocket rightfully points out in the next comment that it's impossible in a regex to determine general python identifier validity).

or, in some cases we might be working with just any python string or number? Not sure the scope of what we're talking about here though.

plocket commented 2 years ago

That is a js answer. Python answer seems to be that it's impossible, so I'm not sure what we'd do about that. I think it's worth remembering that people using different languages also code in da and I don't think A-Za-z covers that, right?

Another concern that your question brought up for me is whether we need to decode the values of dropdowns/choices, which could have a lot more weird characters.

Maybe this really is impossible?

BryceStevenWilley commented 2 years ago

I think it's worth remembering that people using different languages also code in da and I don't think A-Za-z covers that, right

That is correct, and you're right that in python it's impossible to determine identifier from a regex in python, but that's the regex that's used in many docassemble functions, like defined, showif, etc.

# Line 76 in docassemble/base/util.py
valid_variable_match = re.compile(r'^[^\d][A-Za-z0-9\_]*$')

So people coding in other human languages in docassemble won't be able to use unicode alphabets anyway.

whether we need to decode the values of dropdowns/choices, which could have a lot more weird characters.

We do need to decode those, but I recently addressed that in https://github.com/SuffolkLITLab/ALKiln/pull/581.