Catching up.
Data cleaning class created 5 months ago to deal with tweets. This class offers several methods that can be applied directly on a str or a pd.Series. to remove punctuation, hashtags, links, mentions etc...
More details on issue #35
Cambios en este PR:
src/c4v/data/data_sampler.py
Make use of the data cleaner utility when sampling the data.
Use Black formatter
src/c4v/data/data_cleaner.py
Create methods to "clean" texts in varios ways (remove links, hashtags, emojies, punctuation, extra white spaces, trimming, tagging or mentioning, removing Spanish accents).
tests/data/test_data_cleaner.py
This utility can grow depending on the necessities of the cleaning phase.
Feedback is encouraged!
Catching up. Data cleaning class created 5 months ago to deal with tweets. This class offers several methods that can be applied directly on a str or a pd.Series. to remove punctuation, hashtags, links, mentions etc... More details on issue #35
Cambios en este PR:
src/c4v/data/data_sampler.py
Make use of the data cleaner utility when sampling the data. Use Black formattersrc/c4v/data/data_cleaner.py
Create methods to "clean" texts in varios ways (remove links, hashtags, emojies, punctuation, extra white spaces, trimming, tagging or mentioning, removing Spanish accents).tests/data/test_data_cleaner.py
This utility can grow depending on the necessities of the cleaning phase. Feedback is encouraged!