andreaschandra / product-matching

Shopee NSDC Product Matching 2020
0 stars 0 forks source link

Preprocessing Scheme #3

Open andreaschandra opened 3 years ago

andreaschandra commented 3 years ago
def text_cleansing(title):
    table = str.maketrans(string.punctuation, ' '*len(string.punctuation)) #map punctuation to space

    title = re.sub('(\d+)([a-zA-Z]+)', r'\1 \2', title)
    title = re.sub('\]', r' ] ', title)
    title = re.sub('\[', r' [ ', title)
    title = title.translate(table)
    title = re.sub(r'[^(a-z|A-Z|0-9)]', ' ', title)
    title = " ".join(title.split())

    return title