This became a rather big PR. My apologies. It does a fresh migration.
The PR adds 2 libraries both related to html juggling:
Simplehtmldom: https://github.com/simplehtmldom/simplehtmldom - a quite old but battle tested dom parser/manipulator. It's easy to use and I've used it for migrations in the past. Its number one quality is that it is robust. If you throw any (reasonable) HTML at it does pretty well.
HTMLPurifier: http://htmlpurifier.org/docs – I found that running the HTML I'm importing through that removed the correct stuff I don't want and leaves the stuff I do. I tried using wp_kses_post() but it is stripping too much. Note that I am using kses later in the post content massaging, but before adding in stuff like social media and images it is great to have a purifier that we can tweak.
There are also a new class that is reusable called BatchLogic. It doesn't do very much, but the idea is that it can make batching easier and that other classes can use it. The JsonIterator class uses it in this PR.
I've also added some more Gutenberg block functions.
This became a rather big PR. My apologies. It does a fresh migration.
The PR adds 2 libraries both related to html juggling:
wp_kses_post()
but it is stripping too much. Note that I am usingkses
later in the post content massaging, but before adding in stuff like social media and images it is great to have a purifier that we can tweak.There are also a new class that is reusable called
BatchLogic
. It doesn't do very much, but the idea is that it can make batching easier and that other classes can use it. TheJsonIterator
class uses it in this PR.I've also added some more Gutenberg block functions.