GetPublii / Publii

The most intuitive Static Site CMS designed for SEO-optimized and privacy-focused websites.
https://getpublii.com
GNU General Public License v3.0
6.06k stars 407 forks source link

[Bug]: Importing from Wordpress adds “lessless” y “greatergreater” words to URL when the post title contains “«” o “»” #1319

Closed drakegalley closed 4 months ago

drakegalley commented 5 months ago

Operating system

Linux Mint 20.3 (Ubuntu 20.04)

Publii version

0.44.4

Post editor

None

Bug description

After importing my website from Wordpress, I discovered that the original post URLs have been modified in cases where the post titles contained the characters "«" or "»".

When the post title contains such characters, Publii adds the words "lessless" or "greatergreater" to the URL at the positions of such characters. It should be noted that the URLs in Wordpress did not contain "«" or "»" (only the post title because it is a Spanish character), however, Publii adds them to the URL using the words "lessless" or "greatergreater". This causes broken internal links and redirects, among other SEO issues.

Steps to reproduce

The error appears when importing any post that contains "«" or "»" in the post title.

Relevant log output

No response

atomGit commented 4 months ago

hats off to the publii devs for even bothering to deal with the garbage that WordPress produces

my advice: rebuild the site manually if you can - hopefully you don't have hundreds of posts/pages as i did

i ended up spending a lot of time building a shell script (bash) to do the import because publii wasn't going to work for me - if you want i'll publish the script, but it will very likely require some adjustments for your particular case

drakegalley commented 4 months ago

hats off to the publii devs for even bothering to deal with the garbage that WordPress produces

my advice: rebuild the site manually if you can - hopefully you don't have hundreds of posts/pages as i did

i ended up spending a lot of time building a shell script (bash) to do the import because publii wasn't going to work for me - if you want i'll publish the script, but it will very likely require some adjustments for your particular case

Hi!

I have more than 400 posts published, but, fortunately, there were only about 12 posts affected. I had to correct the URLs manually and everything seems to be fine.

I decided to post this issue because I think that other languages ​​with special characters will be affected and it may prevent the migration of especially complex websites.

Regards.

dziudek commented 4 months ago

@drakegalley - wow, you have found 2 of 3 chars which has been transliterated unnecessary :) I have added additional unit tests and I have found that from popular characters used in chars «, » and $ has been transliterated when it shouldn't. In v.0.45 it will be fixed :)