elementor / static-html-output

Static HTML Output Plugin for WordPress
https://statichtmloutput.com
The Unlicense
125 stars 34 forks source link

Unnecessary files included in static files #81

Closed crstauf closed 4 years ago

crstauf commented 4 years ago

When generating the static files, PHP files (within the theme) are created, but only with error output (there's no error when viewing the template).

There are various types of errors:

Additionally, some admin CSS files for plugins are also being included in the static files.

Discovered with v6.6.16; tried with master, but autoloader isn't working...?

thegulshankumar commented 4 years ago

Hi

Can you help me to understand a bit "PHP files (within the theme) are created"?

Is your theme loading dynamic assets with .php extension at Development site which you're trying to export as Static?

Thanks

crstauf commented 4 years ago

Is your theme loading dynamic assets with .php extension

@thegulshankumar No: they are the same file names as the template files used by theme.

leonstafford commented 4 years ago

Thanks for reporting, @crstauf - I think this is similar to some other issues which have crept in, missing the integration tests.

The admin CSS files may continue to be included, but I may try a simple filter of admin against plugin CSS files.

This is due to the 2-pass crawling that it currently does, where we need any potentially used CSS files included in the initial crawl list, in order to parse/follow URLs to assets from within those.

I'm going to try and clean that area up for this upcoming v6.6.x release on wp.org, but will err on the side of detecting too much than not enough.

The issue reported here will definitely be solved though, no PHP files should be getting crawled.

leonstafford commented 4 years ago

@crstauf this should fix the unwanted assets / php files being detected, it also has an unrelated adjustment for pagination URLs. https://github.com/WP2Static/static-html-output-plugin/issues/80 will still be affected in this build until it's fixed.

leonstafford commented 4 years ago

@crstauf - I've updated since last comment, this should be good to test:

static-html-output-plugin-81-further-limit-detection.zip

crstauf commented 4 years ago

@leonstafford Installed the provided plugin, and got this error:

[05-Jun-2020 18:56:31 UTC] PHP Notice:  Undefined offset: 1 in /app/public/wp-content/plugins/static-html-output-plugin/src/DetectArchiveURLs.php on line 72
[05-Jun-2020 18:56:31 UTC] PHP Stack trace:
[05-Jun-2020 18:56:31 UTC] PHP   1. {main}() /app/public/wp-admin/admin-ajax.php:0
[05-Jun-2020 18:56:31 UTC] PHP   2. do_action() /app/public/wp-admin/admin-ajax.php:175
[05-Jun-2020 18:56:31 UTC] PHP   3. WP_Hook->do_action() /app/public/wp-includes/plugin.php:478
[05-Jun-2020 18:56:31 UTC] PHP   4. WP_Hook->apply_filters() /app/public/wp-includes/class-wp-hook.php:311
[05-Jun-2020 18:56:31 UTC] PHP   5. static_html_output_ajax() /app/public/wp-includes/class-wp-hook.php:287
[05-Jun-2020 18:56:31 UTC] PHP   6. StaticHTMLOutput\Controller->generate_filelist_preview() /app/public/wp-content/plugins/static-html-output-plugin/static-html-output-plugin.php:77
[05-Jun-2020 18:56:31 UTC] PHP   7. StaticHTMLOutput\FilesHelper::buildInitialFileList() /app/public/wp-content/plugins/static-html-output-plugin/src/Controller.php:204
[05-Jun-2020 18:56:31 UTC] PHP   8. StaticHTMLOutput\FilesHelper::getDateArchiveURLs() /app/public/wp-content/plugins/static-html-output-plugin/src/FilesHelper.php:514
[05-Jun-2020 18:56:31 UTC] PHP   9. StaticHTMLOutput\DetectArchiveURLs::detect() /app/public/wp-content/plugins/static-html-output-plugin/src/FilesHelper.php:548
crstauf commented 4 years ago

Additionally, while older archive pages were picked up (that hadn't been before!), CSS and SVG files from both Yoast SEO and Query Monitor continue to be picked up (no PHP files from my theme though).

leonstafford commented 4 years ago

@crstauf sorry about that, hopefully this removes the notice: static-html-output-plugin-81-guard-empty-cat.zip

So the Yoast and Query Monitor ones don't break the site, they're just unnecessarily included, right?

I'm thinking to just extend a blacklist of plugin/theme admin asset paths. Not an amazing solution, but can quickly eliminate those from the most popular plugins.

crstauf commented 4 years ago

@leonstafford If I may ask, how does the plugin determine which assets are needed?

leonstafford commented 4 years ago

@crstauf I've just made some changes this week to try and simplify things.

For this V6, in src/FilesHelper.php, we do the initial detection of all WP paths (posts, pages, pagination), along with some common ones, like favicon, robots.txt, sitemaps (kinda working). It used to grab everything from the wp-contents/uploads dir and still has a few extra detections for things like a custom permalinks plugin and Autoptimize's cache dir (shouldn't be needed anymore). Now, it will only do the usual WP paths and grab all CSS files from plugins and themes.

The plugin does only 2 passes when crawling. During the first phase of crawling, it will detect any assets linked to from one of those initially detected paths and crawl those on the second pass. In theory, this should pick everything up. Recent work has fixed the way it parses the HTML and CSS files to detect more linked assets.

The pros are that we can see what we're going to crawl, based on the Initial crawl list and optionally modify that list with actions/filters. It may also lead to some quick performance boosts soon, being able to more easily spawn concurrent crawlers with divided batches of detected URLs to start with.

The downsides are if we fail to detect certain URLs.

SimplerStatic's approach is to crawl more like a normal spider, continually building a list of new pages to crawl as it goes. That's a lot more helpful with things like pagination URLs, where detecting those can be a pain.

We'll see how things develop, probably some hybrid approach will eventuate. If this V6 can detect all URLs for majority of sites and allow custom filters/inclusion rules to workaround for the % that it doesn't work for, that would be a good position for this plugin for now.

crstauf commented 4 years ago

@leonstafford That being the case, how/why would it detect a CSS file that's used only in the admin?

leonstafford commented 4 years ago

@crstauf

grab all CSS files from plugins and themes

It's getting these from the filesystem. WordPress doesn't enforce any structure for where CSS files go for admin or public, they are usually served to the user via code, which can pull them from any path(s) on filesystem and serve them at any URL.

So, as a human, we can't even determine which ones are admin or public CSS files just by looking within the plugin. We can guess, based on if the directory is called "admin", "public", "dist", etc, but there's sure to be some out there mixing it up, plus all the non-English or typos.

We could possibly listen to the wp_enqueue_styles() as we crawl the whole site on first pass, to detect just what's being loaded, but it's a bit messy and slow.

Alternatively, we could not detect any CSS files for initial crawl list and add a third round of crawling. That might be the sanest approach for V6 right now: only detect HTML/XML/special files during pre-crawling detection; detect all other HTML/XML/CSS/imgs, etc files on 2nd pass and 3rd pass for all the assets detected within the 2nd pass crawl. That should be enough to cover all assets and remove the need for the greedy detection of CSS initially

thegulshankumar commented 4 years ago

Just like robots.txt, please include ads.txt by default. I see many users need.

leonstafford commented 4 years ago

@thegulshankumar I've put ads.txt in now, will be in next build.

Closing this issue, as recent detection / crawling changes should have sorted it