-
This is where i document all the class projects.
__Class assignment
![HR data analysis.png](https://github.com/user-attachments/assets/d51d8201-4304-43bc-bdb0-a804a0170ab8)
![HRdata analysis.png](h…
-
the current instruction dataset doesnt have cleaned text.
- [ ] should we include that
-
| --- | --- |
| Bugzilla Link | [583248](https://bugs.eclipse.org/bugs/show_bug.cgi?id=583248) |
| Status | NEW |
| Importance | P3 normal |
| Reported | May 12, 2024 04:11 EDT |
| Modified | Jul…
-
module 'emoji' has no attribute 'get_emoji_regexp'
-
Cleaning your translation dataset is crucial for achieving good results with a transformer model. Here are some key steps to effectively clean your dataset:
1. **Remove Duplicates**:
- Check fo…
-
- https://medium.com/@datascientist_SheezaShabbir/text-cleaning-in-nlp-libraries-techniques-and-how-to-get-started-8c7c7e8ba7cf
- https://spotintelligence.com/2023/09/18/top-20-essential-text-cleanin…
-
Package: libtorch:x64-windows@2.1.2#7
**Host Environment**
- Host: x64-windows
- Compiler: MSVC 19.41.34123.0
- vcpkg-tool version: 2024-10-18-e392d7347fe72dff56e7857f7571c22301237ae6
vcpkg-s…
-
Thanks for this awesome tool! I was wondering if we could include some sanity checking/cleanup for badly behaved text (e.g. all those invalid unicode characters). Could be as simple as running [ftfy](…
-
**Is your feature request related to a problem? Please describe.**
The cleaning of the text makes it impossible to link annotated spans to the character indices of the original text. This in turn mak…
-
# CURRENT WORKFLOW
[Can we do it all from here](https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_replace2)
KALKI basic DataCleaning
We use the replace() Method to delete or c…