Closed GoogleCodeExporter closed 9 years ago
I think (someone correct me if I'm wrong here) that the problem is there's no
"dumbdown" facility for Japanese in TXP.
There's a function in lib/txplib_misc.php called dumbDown() which converts
'foreign'
(predominantly European/Middle-eastern) character sequences to their (rough)
ascii
equivalents so that a url-only title can be automatically constructed. Also
look in
the file lib/i18n-ascii.txt for more.
Since there's no dumbdown for Japanese (among others), when you type an entirely
Japanese article title (e.g. 金魚) and save it, TXP can't automatically
create a URL
title for the article. In the latest SVN version you'll see a warning that the
article contains an empty url-only title. IN TXP 4.2.0 you will probably get an
article with an erroneous single dash (see Issue 36, now fixed).
The upshot is that you'll have to currently type a URL-only title in manually
that
conforms to <a href="http://www.faqs.org/rfcs/rfc1738.html">RFC 1738</a>.
If you have any idea how dumbDown() / i18n-ascii.txt can be made to convert
Japanese
characters into ascii then please send them over!
Original comment by stefdawson
on 4 May 2010 at 1:55
Thanks for the reply.
I've been testing stuff after your reply, which really helped me solve? the
problem.
Mking a new article with a title that includes Japanese doesn't work as
reported, but
editing the article from the advanced option "url-only title", everything works
fine,
even using only Japanese characters.
I'm not really sure, but urlencode() at line 2803 in the file
publish/taghandlers.php, is probably making things go ok. So, i just commented
out
line 722 in the file lib/txplib_misc.php -> $text = sanitizeForUrl($text); to
not
dumbDown() and do all the replacement, which seems to work fine right now.
If the titles are getting urlencode, i dont see why we need dumbDown() and all
the
other stuff in the function sanitizeForUrl(). But, considering the fact that
somepeople(or many people) might prefer using dumbDown, adding an option
whether to
dumbDown or just urlencode, might work out.
I have been testing my Japanese url-title at my website as followed, which you
can
see that it is actually working fine.
http://nnnnn.me/textpattern/normal/
Also, related to this topic, i suggest to use the function rawurlencode()
instead of
urlencode() for TXP in order to follow RFC1738 better.
http://www.php.net/manual/en/function.rawurlencode.php
urlencode() could be found in the following files and lines.
publish.php:275,277,280,283,287,386
include/txp_discuss.php:378
include/txp_file.php:210,211
include/txp_image.php:200
include/txp_plugin.php:99,102,227
lib/txplib_html.php:102,103,114,127,130,164,1912,1972,1979,1986,2000
publish/taghandlers.php:2802,2803
thank you
Original comment by dev.nnnn@gmail.com
on 7 May 2010 at 9:14
Change set #3344 introduces a fallback for languages which lack a suitable
transliteration.
The two instances of urlencode() used in include/txp_file.php plus the ones in
publish.php have counterparts using urldecode() and are just used internally.
Thus,
while it may not fulfill RFC 1738, the method used for encoding is
insignificant as
long as these both match.
I haven't looked into the other instances you mentioned, so please open a
separate
issue if you discover functional deficits stemming from our use of urlencode()
Original comment by r.wetzlmayr
on 7 May 2010 at 3:19
Original issue reported on code.google.com by
dev.nnnn@gmail.com
on 28 Apr 2010 at 4:53