Closed factus10 closed 1 year ago
Same here. I have wordpress installed in a folder called "wp" at /var/www/somesite.tld/wp/. Now on some URLs "wp" is missing.
Loading failed for the <script> with source “https://[DOMAIN-CUT]/wp-includes/js/wp-emoji-release.min.js?ver=5.8.1”. [DOMAIN-CUT]:1:1
Loading failed for the <script> with source “https://[DOMAIN-CUT]-content/themes/twentytwentyone/assets/js/primary-navigation.js?ver=1.4”. [DOMAIN-CUT]:172:1
Loading failed for the <script> with source “https://[DOMAIN-CUT]-content/themes/twentytwentyone/assets/js/responsive-embeds.js?ver=1.4”. [DOMAIN-CUT]:173:1
Loading failed for the <script> with source “https://[DOMAIN-CUT]-includes/js/wp-embed.min.js?ver=5.8.1”.
You can see that the URL in the first line is formed correctly the other URLs have cut off "/wp" so "wp" including a leading slash. (The first line's URL is formed correctly but wp-emoji-release.min.js is not copied, to be clear)
Log file does not show anything suspicious:
https://[DOMAIN-CUT]/wp/wp-content/themes/twentytwentyone/style.css
gets deteced correctly but ends up as
<link rel="stylesheet" id="twenty-twenty-one-style-css" href="https://[DOMAIN-CUT]-content/themes/twentytwentyone/style.css?ver=1.4" media="all">
"wp-content" becomes "-content" it seems.
Just to add possibly related case to this issue, I have the following automatically generated sitemap:
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="//cms.scantrust.com/wp-content/plugins/wordpress-seo/css/main-sitemap.xsl"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://cms.scantrust.com/post-sitemap.xml</loc>
<lastmod>2022-09-15T13:13:38+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://cms.scantrust.com/page-sitemap.xml</loc>
<lastmod>2022-09-07T14:41:52+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://cms.scantrust.com/case-study-sitemap.xml</loc>
<lastmod>2022-09-05T15:49:28+00:00</lastmod>
</sitemap>
</sitemapindex>
<!-- XML Sitemap generated by Yoast SEO -->
Notice on line 1 how the protocol is missing. Simply Static doesn't appear to be picking this one up and remains unchanged in the generated version
We should probably improve the regex pattern for the URL replacements here: https://github.com/patrickposner/simply-static/blob/dcd0cf3714388cfd63f9ba47587dc1ea79381ae9/src/class-ss-url-extractor.php#L167
To summarize the cases where the current RegEx patterns might fail:
Another problem related to this is coming from this plugin: https://wordpress.org/plugins/host-analyticsjs-local/ Here is an example code that does not get replaced correctly:
gtag('config', 'G-XXX', {"cookie_prefix":"CaosGtag","cookie_domain":"wp.mysite.com","cookie_expires":2592000,"cookie_flags":"samesite=none;secure","allow_google_signals":false,"anonymize_ip":true,"site_speed_sample_rate":"1"});
wp.mysite.com should be replaced with mysite.com.
Rather than locking the URL replacement to specific tags for specific DOM elements, could it instead just search for the domain in the html doc, treating the html doc as a string rather than as a structured data type. Maybe search for the website name and then use that to match outwards to pull links out of less common tags for replacement/download.
Hey @AndrewKahr, that's what "Force URL replacement" in Simply Static -> Settings -> Advanced is doing. It replaces all occurrences of the domain by searching the generated HTML/CSS/JS/XML files.
The problem with this approach is that often you don't want to replace all URLs or specifically ignore some (like forms, iframes, Ajax-related calls..)
That's why it's added as an option rather than the default behavior :-)
We are closing that issue now.
We have already integrated the following:
We will add a separate issue if a new case is coming up.
I have a locally hosted website at https://timex/
The site has "timex" in the URL path in a number of pages so, for instance: https://timex/computers/timex-sinclair-2068/
becomes: /computers/-sinclair-2068/
I used relative links.
I suspect this is an over-eager regex somewhere :)