epiphyt / embed-privacy

Embed Privacy prevents loading of embedded external content and allows your site visitors to opt-in.
https://epiph.yt/en/embed-privacy/
GNU General Public License v2.0
19 stars 12 forks source link

Handling of slashed data #183

Open kraftner opened 1 year ago

kraftner commented 1 year ago

Bug/Problem

Handling of slashed data

Okay, this is a pesky one as in "this is why we all love love love working with a massively legacy software like WordPress".

The gist is this: When operating with slashes in strings you need to be very careful in WP where they come from and where they might go to. Since regex often contain backslashes you've also run into this in this plugin as I can see (https://github.com/epiphyt/embed-privacy/commit/7144f8d6bec9af65fc98b1e444a3ef708991ab3b and https://github.com/epiphyt/embed-privacy/issues/98), but I think the way it is currently handled is not ideal.

The following zooms out a bit more to give the full picture. Sorry if you're aware of that just skip the next section. It is mostly intended to sort out the story for my own brain.

A fairy tale

Once upon a time there was a thing called Magic quotes in PHP that automatically added backslashes to data in superglobals like $_POST. So if your form sent the value \ it automatically got turned into \\. The idea was some kind of magic auto-escaping, for details have a look at the Magic quotes Wikipedia page. This was soon discovered to be well-intended but a bad idea in practice and hence got removed from PHP in 5.4. But WordPress wouldn't be WordPress if it didn't fail to detach itself from bad legacy behaviour like a crackhead. So WordPress decided to stick with the magic and this is how we got wp_magic_quotes().

So, 'till this very day if you send data to WordPress it does its magic and automatically adds backslashes and all data you get from e.g. $_POST will be "slashed" as we tend to call it - it will have those additional backslashes.

So, when you then want to use that data and you know that WordPress has done its magic to get the raw value you just do wp_unslash( $_POST[ 'my_field' ] ).

Sidenote: The magic runs right after the plugins_loaded action. So you only need to unslash data if you get it from $_POST after that action.

Great! We now have the value and can use it any way we like. Well my lucky reader, there is more fun ahead of us!

As I'd assume as a result of this magic many WP functions (but only some) expect to receive "slashed" data. This includes such common functions as update_post_meta or wp_insert_post. For some like wp_update_post it's even more fun since it expects slashed or unslashed data, depending on if you pass the data as array or as object.

So when you get data from $_POST and then save it again using e.g. update_post_meta or wp_insert_post you're expected to:

  1. Get the value from $_POST
  2. Maybe (are we before or after plugins_loaded?) unslash it
  3. Operate on it, e.g. sanitize it
  4. Slash it again, or not (depending on which function you use and how you pass data)
  5. Pass it to one of those function to save data that expect their data slashed (and remove the slashes again before storing)

So basically instead of just

  1. manipulate data, e.g. sanitize
  2. write to database

What actually happens or needs to happen is this:

  1. add slashes (WP)
  2. remove slashes (our code)
  3. manipulate data, e.g. sanitize (our code)
  4. add slashes (our code)
  5. remove slashes (WP)
  6. write to database (WP)

Isn't it lovely to work with WordPress. :upside_down_face:

How Embed Privacy does it

Nice story, but why are you telling me this you might wonder?

Well you do handle all of this and it seems to work. But I think technically it only really does for the regex_default post meta. And even there the way you do it technically has some things in the wrong order, although the end result seems to be correct. Let me explain.

What you actually do is this:

  1. add slashes (WP)
  2. add slashes in \epiphyt\Embed_Privacy\Embed_Privacy::preserve_backslashes() which results in double-escaped data
  3. remove slashes in \epiphyt\Embed_Privacy\Fields::save_fields()
  4. remove slashes (WP)
  5. write to database (WP)

What I'd propose

So unless you want to add escaping for the regex_default field (not sure what escaping would make sense for a regex) you can just remove preserve_backslashes() and use the raw value from $_POST without running it through wp_unslash in save_fields().

This

Blame WordPress

Phew. I know. I'm also running into this again and again. It makes it hard to use the general PHP ecosystem. Everyone struggles with it all the time. Even core itself. So when I did see this I felt like I just have to share this. Should probably better have been a blog post now that I think about it. Well. Now it's here.

If all of this makes sense (this time) and you think my proposed solution is good I'd also be willing to make a PR for this.

Steps to reproduce

see above

Version

1.7.2

Link

No response

Environment info

No response

Code of Conduct

MatzeKitt commented 1 year ago

Thank you for your effort in this detail and also for the whole story. I indeed wasn’t aware of the exact reason behind it.

Your proposed solution seems to be valid and should be working, however it will take much effort into testing (adding, changing, restoring embed providers as well as running migrations). And since the current implementation works and is not a big deal in performance hits, I would rather hold it back until version 2.0 for such a change.

kraftner commented 1 year ago

Since this only really affects the save routine for the post meta field storing the regex I wouldn't expect this to break anything for anyone unless they mess with the internals in really weird ways.

Still I do appreciate your caution and agree this probably makes sense to be held back for 2.0.