Closed ghost closed 10 months ago
Hello, we currently do not support this. But this can be accomplished by using something like the function below:
import re
def remove_repetitions_ar(s, policy=1):
"""Reduces the repeated characters (more than two repeated)
from an Arabic string to one or two characters based on the
optional specified policy.
Args:
s (:obj:`str`): The string to be normalized.
policy (:obj:`int`, optional):
The reduction policy. If policy=`1` the repeated characters will
be reduced to `1` character. If policy=`2` the repeated characters
will be reduced to `2` characters. Defaults to `1`.
Returns:
:obj:`str`: The normalized string.
"""
_REP_AR_RE = re.compile(r'(.)\1{2,}')
if policy == 1:
return _REP_AR_RE.sub(u'\\1', s)
elif policy == 2:
return _REP_AR_RE.sub(u'\\1\\1', s)
else:
raise ValueError("Policy value should be either 1 or 2!")
remove_repetitions_ar('مرحباااا')
'مرحبا'
Hope this is helpful.
yes it helps a lot why I asked because I saw in the docs the module camel_tools.morphology.errors.MorphologyError So I was thinking may be this module is for errors like the repeating characters. but unfortenatly the docs don't have enough examples. so is there any module in camel tools that check grammar or orthographic errors and correct it?
Hi I'm working on cleanin an arabic dataset and it has repeating characters inside a string for example "مرحباااا" instead of "مرحبا" Is there a function in Camel tools fo this, because I read the documentation didn't find somethhing related and also Itried the command camel_arclean but still the same repeating characters. Waiting for your help.