Closed xdabhi closed 4 years ago
@av2032 many thanks for reporting this!
I think this bug has crept in with recent changes to the way we detect URLs during crawling. Hindi was working fine before the latest big changes.
My guess is that when we discover a new URL during crawling and check whether it's new, the plugin is probably not converting the characters correctly.
Ie, when you visit one of those URLs on the site and copy paste the browser address, you'll often see it encoded like
%2F1857-%E0%A4%95%E0%A5%87-%E0%A4%AA%E0%A4%B9%E0%A4%B2%E0%A5%87-%E0%A4%B2%E0%A5%8B%E0%A4%95%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A4%BF%E0%A4%AF-%E0%A4%B5%E0%A4%BF%E0%A4%A6%E0%A5%8D%E0%A4%B0%E0%A5%8B%E0%A4%B9%2F
Rather than the Hindi representation.
I'm hoping this will be a quick fix, I'll start testing now.
confirmed able to reproduce locally. Crawl log while in neverending crawl looks like:
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: initial_crawl_list
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /2020/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /2020/06/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /2020/06/21/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /arrobase/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /author/admin/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /blog/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /category/non-classifiee/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /de-home/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /french-home/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /page-with-external-url-in-style/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /sample-page/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /sassytest/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/
200 /के-पहले-लोकप्रिय-विद्रोह/ Note: discovered on: /के-पहले-लोकप्रिय-विद्रोह/
@av2032 - this looks better in my testing. Could you please confirm if the fix is OK for you?
Thank you @leonstafford
It worked 😃
Hi Leon,
I hope you are doing great.
I'm using 6.6.20 on my WP site which has permalinks in 'Hindi' language. The issue is that it keeps detecting the same URL multiple times while exporting. I have only 174 posts (Published + drafts) yet there is no ending to crawling and discovering new URLs.
When I clicked on show discovered URLs while exporting it shows the same URLs multiple times.
URL https://gkhindi.net/wp-admin/admin.php?page=statichtmloutput&statichtmloutput-crawl-queue=1
https://pastebin.com/raw/v6LJfzAD
I ran export for hours but it did not finish discovering and crawling new URLs. Kindly have a look at it.
Thanks