bigscience-workshop / catalogue_data

Scripts to prepare catalogue data
Apache License 2.0
8 stars 1 forks source link

Add substring remover mapper #30

Closed cakiki closed 2 years ago

cakiki commented 2 years ago

This function is meant to strip repeated strings like the ones here: https://github.com/bigscience-workshop/catalogue_data/issues/5#issuecomment-1057424610

lvwerra commented 2 years ago

Can you add your function to the following two places?

  1. the __init__.py in the same folder
  2. import it to clean.py and add it to the dictionary