elementor / static-html-output

Static HTML Output Plugin for WordPress
https://statichtmloutput.com
The Unlicense
125 stars 35 forks source link

Never ending detected URLs while crawling #112

Closed xdabhi closed 4 years ago

xdabhi commented 4 years ago

Hi Leon,

I hope you are doing great.

I'm using 6.6.20 on my WP site which has permalinks in 'Hindi' language. The issue is that it keeps detecting the same URL multiple times while exporting. I have only 174 posts (Published + drafts) yet there is no ending to crawling and discovering new URLs.

image

When I clicked on show discovered URLs while exporting it shows the same URLs multiple times.

URL https://gkhindi.net/wp-admin/admin.php?page=statichtmloutput&statichtmloutput-crawl-queue=1 image

https://pastebin.com/raw/v6LJfzAD

I ran export for hours but it did not finish discovering and crawling new URLs. Kindly have a look at it.

Thanks

leonstafford commented 4 years ago

@av2032 many thanks for reporting this!

I think this bug has crept in with recent changes to the way we detect URLs during crawling. Hindi was working fine before the latest big changes.

My guess is that when we discover a new URL during crawling and check whether it's new, the plugin is probably not converting the characters correctly.

Ie, when you visit one of those URLs on the site and copy paste the browser address, you'll often see it encoded like

%2F1857-%E0%A4%95%E0%A5%87-%E0%A4%AA%E0%A4%B9%E0%A4%B2%E0%A5%87-%E0%A4%B2%E0%A5%8B%E0%A4%95%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A4%BF%E0%A4%AF-%E0%A4%B5%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%B0%E0%A5%8B%E0%A4%B9%2F

Rather than the Hindi representation.

I'm hoping this will be a quick fix, I'll start testing now.

leonstafford commented 4 years ago

confirmed able to reproduce locally. Crawl log while in neverending crawl looks like:

200       /के-पहले-लोकप्रिय-विद्रोह/   Note: initial_crawl_list     
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /   
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /2020/  
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /2020/06/   
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /2020/06/21/    
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /arrobase/  
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /author/admin/  
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /blog/  
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /category/non-classifiee/   
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /de-home/   
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /french-home/   
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /page-with-external-url-in-style/   
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /sample-page/   
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /sassytest/     
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/  
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/  
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/  
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/  
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/  
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/  
200       /के-पहले-लोकप्रिय-विद्रोह/   Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/  
leonstafford commented 4 years ago

@av2032 - this looks better in my testing. Could you please confirm if the fix is OK for you?

static-html-output-plugin-6.6.21.zip

xdabhi commented 4 years ago

Thank you @leonstafford

It worked 😃