-
Pdfminer does not properly process the text on the page 16 of the following document:
https://www.sugarsync.com/pf/D62078_93519013_6758203
The correct result would be:
```
CBM2013-00025
US 7,856,430…
-
We need to add information about how to use non-English stopwords for topic modeling and TF-IDF
-
# What
Meetup to discuss Chapter 13 of [Text Analysis with R for Students of Literature](http://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/) together and doing the exercis…
-
正常index模式应该给出分词的多种情况,但是实际index模式和query模式分词结果一样,和readme里面介绍的不一样(readme里面介绍 index模式 “中国”会分词为“中国”、“中”、“国”):
以下是三个示例,text=中国是社会主义国家,text=中国,text=版本号,type=index_ansj,结果和索引模式的表现不一致,感觉都是query模式。而且确实和query模式…
-
### Contact Details
_No response_
### Is your content request related to a problem you've encountered during your research process? Please describe.
I recently stumbled upon ConvoKit and Named Enti…
-
How do others define search words? What methods do they use? How to develop a corpus? What types of sources?
-
### Feature scope
All Log Groups
### Describe your suggested feature
We need a way of creating Metric Filters on CloudWatch Logs, to allow us to filter based on text patterns and create alarms auto…
-
### Description
Lucene 9 added new language analysis components for Scandinavian and Telugu text. We should make these available in elasticsearch.
-
## Deque Analysis Summary
On homepage, text over image has insufficient color contrast (both for the regular text and the green hyperlinks).
https://demo7.dspace.org/home
1. Regular white tex…
-
I have been looking for using Elastic Search to do some text analysis on documents in Urdu language. So far, the support doesn't exist for Urdu language (https://www.elastic.co/guide/en/elasticsearch/…