-
# CURRENT WORKFLOW
[Can we do it all from here](https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_replace2)
KALKI basic DataCleaning
We use the replace() Method to delete or c…
-
the current instruction dataset doesnt have cleaned text.
- [ ] should we include that
-
module 'emoji' has no attribute 'get_emoji_regexp'
-
Updated: Ensured it works with batches of images as well
Add this stuff to filter out the special tokens. This also makes sure that all the other functions still work as well, since they rely on th…
-
Description for a single task from the bidder
![image](https://github.com/codersforcauses/penni/assets/100743188/f4277ce9-acb4-4141-8f11-aff56cfa3a26)
![image](https://github.com/codersforcauses/p…
-
**Is your feature request related to a problem? Please describe.**
The cleaning of the text makes it impossible to link annotated spans to the character indices of the original text. This in turn mak…
-
Try running the following script:
```
from Standard.Base import all
import Standard.Base.Runtime.Managed_Resource.Managed_Resource
import Standard.Base.Runtime.Ref.Ref
type My_Resource
Val…
-
Thanks for this awesome tool! I was wondering if we could include some sanity checking/cleanup for badly behaved text (e.g. all those invalid unicode characters). Could be as simple as running [ftfy](…
-
**Is your feature request related to a problem? Please describe.**
From a user report, removing diacritics doesn't work on Arabic words
**Describe the solution you'd like**
Improve the cleaning…
-
Code
```js
var vosk = require('vosk')
const fs = require("fs");
var mic = require("mic");
MODEL_PATH = "./vosk"
SAMPLE_RATE = 16000
if (!fs.existsSync(MODEL_PATH)) {
console.log("Pl…