dreamwidth / dreamwidth

Dreamwidth's open source repository
Other
179 stars 176 forks source link

completely check logic in LJ::Talk::Post::require_captcha_test #3182

Open rahaeli opened 1 year ago

rahaeli commented 1 year ago

Someone asked me why we make anonymous comments that contain the word "message" solve a captcha even if the journal itself has captcha disabled, like it was a deliberate choice we made, to which I said "what" and also "the fuck" and tested it, and sure enough: if you have anonymous commenting enabled and all captcha off, and someone tries to leave an anonymous comment containing the word "message", you will be prompted to solve a captcha. This confused me, because we, uh, don't have any system with a list of words that triggers a captcha (we have a way to specify words that EXEMPT a comment from captcha, but not TRIGGER it), so we went looking and Jen turned up cgi-bin/LJ/Talk.pm line 2992, which is older than DW history, and is, uh. A very creative regex.

Let's just fix that, shall we.

rahaeli commented 1 year ago

Looking at the surrounding code, there's A LOT of moon logic above that line in there, too, so lemme get out of the bathtub and go and look at this on a large enough screen to read the damn code and note down what we should change

rahaeli commented 1 year ago

Okay, I traced this back to the original commit that introduced this behavior, and from the commit message it looks like it was at least deliberately firing if the message contained "message": https://github.com/rahaeli/livejournal/commit/5619254448ad13898719df4826198c7a91c04eed

rahaeli commented 1 year ago

Having looked at this whole thing in context, and noticed that the numbered steps in the comments for the logic go 1, 4, 2, 3, 4, I am rescoping this to "completely recheck logic in LJ::Talk::Post::require_captcha_test" because it's pretty clear nobody has looked at this holistically for a while and it's got a lot of old LJ-isms in it.

Actual business logic following in a few:

rahaeli commented 1 year ago
  1. if the comment contains 2 or more URLs
  2. if the comment contains any attempt to use BBCode (that's what line 2989 is trying to do, I am reasonably sure)
  3. if the comment contains anything that's on the naughty list introduced in issue #3080

2974-2983 is very very old code (Brad committed the first version of that check 19 years ago) and I am having trouble figuring out what it does that the HTML cleaner doesn't. I don't see any reason to show a captcha to people who are using html that the html cleaner allows just because it's not on a specific list of a subset of allowed html. If I am missing something and we do want to keep that, though, we should expand that to at least include more modern html tags; I think a good list keeping in the spirit of the original list would be "abbr, blockquote, b, code, dd, del, details, dfn, dl, dt, em, hr, i, ins, kbd, li, p, pre, q, rt, ruby, s, samp, strong, sub, summary, sup, tt, u, ul, var"