-
**Learning Goals**
- To know the best among the five Naive Bayes algorithms in analyzing the sentiment analysis
Gaussian, Categorical, Complement etc
- The basics of NLP such as token…
-
### Description
I came across with some trouble in generating training data using t2t-datagen. The tokenizes are messy.
![image](https://user-images.githubusercontent.com/32641072/54520210-2756af0…
-
I can't copy the empty line after tokenizing my diff:
![image](https://github.com/otakustay/react-diff-view/assets/13199771/deed6d5c-fe0f-4dad-8f25-ec9c01787049)
-
The current tokenizer is pretty unaware of the structure of the text. Situations to improve upon would be
## tokenizing links
Something like `http://www.google.com/useless/junk` gets transformed…
-
# 맞춤법 검사 전처리
📌 가설
- 맞춤법을 처리하지 않은 데이터와 맞춤법 검사를 한 훈련 결과 비교
hanspell 적용 결과 train_loss 줄어듬 확인
```python
def correct_spell(self, text):
spelled_sent = hanspell.spell_checker.check(t…
-
‘ (right/left curly single quote) is not split off from words when tokenizing
-
Hello.
Testing out Dask to help me deal with over 46M rows of data. I'm loading it like so:
`dask_df = dd.read_csv(FILE_PATH)`
and when I, for example, look at the head I see the head of the…
-
I'm using a button with a flex layout whose label is a comma delimited list of other labels. I've using the characters tokenize strategy, but the only way to get the word-wrap to correctly truncate th…
-
_This is a list of CSS Tokenizers._
_This issue is not intended for in depth discussion about any individual tokenizer or any aspect of CSS tokenizing._
- [`csslex`](https://github.com/keithamus/c…
-
### 🚀 The feature
Currently, Saver only allows write mode and only users to choose byte vs text mode. It might be useful to allow the flexibility to append to an existing file.
### Motivation, pitch…