-
We need some kind of thing where you input a dictionary or list and have words get replaced. This is important for standardizing synonyms, dates, and contractions. I think code like this make it easie…
ccsv updated
10 years ago
-
I think there may (eventually or next week, given [this query](https://community.scripture.software.sil.org/t/toc-for-bible-modules/2645)) be a demand for some kind of generic list-of-X handling code…
-
**Describe the bug**
I noticed a missing commit (using just the default config) that was due to the type being capitalized.
https://www.conventionalcommits.org/en/v1.0.0/#are-the-types-in-the-co…
-
We have to do a fair amount of text preprocessing of our data before feeding it into tensorflow. Since the text manipulation abilities of tensorflow and tensorflow tranform are still relatively immatu…
-
-
### Description
Expected behavior:
```shell
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
>>> tokenizer.encode('')
[50256]
``…
-
I just wanted to make you aware that frameworks such as 'VB.NET' or 'ASP.Net' are considered URLs after tokenization and are thus not splitted (which is probably good). This is also the case for some …
-
Hi!
I was checking out libpostal, and saw something that could be improved.
---
#### My country is
##### Pakistan but i was working on Indonesian data for a project
---
#### Here's how I'm u…
-
`./build.sh --install --prefix=~/.local` fails with the error
```
tee: '~/.local/bin/bat-modules': No such file or directory
```
Copying manually `bin/bat-modules` to `~/.local/bin` does not help.…
galou updated
3 years ago
-
Hi Villu,
I'm trying to create a pmml file from the sklearn model below. I use TfidfVectorizer on character level and a random forest classifier.
The model predicts the datatype of column based on…