-
Hi,
Is it possible to extract/generate word embeddings using **BanglaBERT?**
I have **tokenized** my Bangla sentence using BanglaBERT. Now I want to generate **Word Embeddings** from my tokenized s…
-
SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very useful for a wide variety of use cases. It would be good to place them into core Lucene.
---
Migrated from [LUCENE-1377]…
-
# Experiments design
Follow discussion [here](https://docs.google.com/document/d/110tlidAcpiNteKnA27tR5KPS_VahNqYKqCeJlu1MWww/edit#heading=h.wmf5tyes1tfk)
## pointers to code and datasets
### …
-
## Description of bug / unexpected behavior
trying to set up logging in a config file fails
## Expected behavior
logs written to a file
## How to reproduce the issue
Code for reproduci…
-
Gives an error when running this format with transformers 3.5.1-
as huggingface -transformers update their script i found it in their legacy folder then download in colab & run
```py
!export SQUA…
-
- [ ] Training SentencePiece
```python
from bnlp import SentencepieceTokenizer
bsp = SentencepieceTokenizer()
data = "raw_text.txt"
model_prefix = "test"
vocab_size = 5
bsp.train(data, mode…
-
Because of bad internet connection and computational issues its hard for us to train a large number of epochs. We're trying to use the run_squad.py script for bangla QA system training. We have traine…
-
from sklearn.feature_extraction.text import CountVectorizer
corpus = ['ভৌতিক গল্প পড়তে চাইলে লাইক দেন','শয়তান সহজে মরেনা ওতো একটা মানুষ রুপী শয়তান','তুমি ছুয়ে দিলে মন']
vec = CountVectorizer()
x = v…
-
Can you add a new section where describing how to add a new language support?