-
**Describe the bug**
A Valid Toxonomy is throwing errors instead of being used to train data.
**To Reproduce**
Steps to reproduce the behavior:
1. Clone taxonomy
2. add your taxonomy data to on…
-
According to our discussion https://github.com/sindresorhus/awesome/pull/626, I generated a list of not-pushed-in-previous-3-month repositories with date pulling from GitHub API:
- [Cordova](https:/…
-
Datasets are generally related to a particular field of science, but also to sub-fields. A user will be used to look for data corresponding to his/her speciality. To facilitate the research of data an…
buhem updated
4 years ago
-
These are the published projects that still have possibly broken lists of references (see issue #2137):
Broken version
Older version
bionlp-workshop-2023-task-1a/2.0.0
bionlp-workshop-2023-task-…
-
Reading the paper on multisimlex, I realized that there is some tradition to these datasets, although they are small and have nothing to do with historical linguistics or psychology: http://lcl.unirom…
-
Statistics:
1. Brief overview of datasets - different parts/subsets/files
2. Datasets size - size of training, test and dev sets
3. Different classes of labels and their counts
4. Sample data (20 …
-
I noticed weird scores while analyzing `WikiTitles/v3` for `en-ru` language pair. It turned out that the direction of the downloaded dataset is the opposite of the language codes:
```
(base) admin…
-
Every Breath You Don't Take: Deepfake Speech Detection Using Breath
https://arxiv.org/abs/2404.15143
-
The conversion procedure, i.e. [this code](https://github.com/ArneBinder/pie-datasets/blob/main/dataset_builders/pie/argmicro/argmicro.py#L216-L260), should follow the description in the section "3. D…
-
Hi Maarten
Thank you for contributing BERTopic to the world.
I am an MA candidate in Linguistics and I am actually writing my thesis on BERTopic.
Programmingwise, I am at an early-intermediate st…