Closed GoogleCodeExporter closed 9 years ago
One problem that I see right away is that "new not in ..." is comparing the
whole
string, but it was just a character before. I'm not sure I see how this works
though.
How does "not new.endswith('_')" work better for foreign characters? Is that how
python sees utf8 extended characters?
I think the problem is here is that the first character is not alphanumeric. So
just
saying:
elif len(ret) > 0 and ret[len(ret)-1].isalnum()...
would probably do the trick.
Is it easy to translate the ü and similar characters to ascii? The openlyrics
said
just to use the foreign characters, but I'm not comfortable with it, so I think
it
would be good to translate the characters if possible.
Original comment by bradleelandis
on 5 Mar 2010 at 3:10
You are right with your first statement, that's a typo and it should be "i not
in ...".
"not new.endswith('_')" is doing the same thing like before, that has nothing
to do
with umlauts, it looks just if there was a '_' before, and if so, it doesn't
append
another '_'.
So umlauts are not translated, but replaced with '_' as before and if you've
got more
umlauts in a row, there will be only one '_'.
You are right, the problem appears if the first letter is non-alphanumerical.
So your
solution would work also, but it's a bit inconvenient to loop a string with
"for i in
range(len(str))" if you can do it with "for i in str"
Original comment by s.mehrbrodt
on 6 Mar 2010 at 1:41
I've commited this to svn, I haven't found an easy way to translate umlauts to
ascii,
but this is ok for me that they are converted to '_'.
Original comment by s.mehrbrodt
on 7 Mar 2010 at 4:24
Original issue reported on code.google.com by
s.mehrbrodt
on 4 Mar 2010 at 11:12Attachments: