linguistics-dataset Search Results

instructlab/instructlab #2338

taxonomy diff validated taxonomy produces: Generating datase…

**Describe the bug** A Valid Toxonomy is throwing errors instead of being used to train data. **To Reproduce** Steps to reproduce the behavior: 1. Clone taxonomy 2. add your taxonomy data to on…

djkemmet updated 4 days ago

sindresorhus/awesome #628

No longer actively maintained repositories

According to our discussion https://github.com/sindresorhus/awesome/pull/626, I generated a list of not-pushed-in-previous-3-month repositories with date pulling from GitHub API: - [Cordova](https:/…

chaconnewu updated 1 month ago

schemaorg/suggestions-questions-brainstorming #249

Considering adding a new property for dataset descriptions.

Datasets are generally related to a particular field of science, but also to sub-fields. A user will be used to look for data corresponding to his/her speciality. To facilitate the research of data an…

buhem updated 4 years ago

MIT-LCP/physionet-build #2291

Published projects with scrambled reference lists

These are the published projects that still have possibly broken lists of references (see issue #2137): Broken version Older version bionlp-workshop-2023-task-1a/2.0.0 bionlp-workshop-2023-task-…

bemoody updated 3 days ago

concepticon/norare-data #134

Cross-Lingual Similarity Datasets

Reading the paper on multisimlex, I realized that there is some tradition to these datasets, although they are small and have nothing to do with historical linguistics or psychology: http://lcl.unirom…

LinguList updated 2 years ago

vymana/indic_nlp #11

Analyze the basic statistics of datasets - 3

Statistics: 1. Brief overview of datasets - different parts/subsets/files 2. Datasets size - size of training, test and dev sets 3. Different classes of labels and their counts 4. Sample data (20 …

matrixdecoded updated 3 years ago

Helsinki-NLP/OPUS #12

WikiTitles en-ru is ru-en

I noticed weird scores while analyzing `WikiTitles/v3` for `en-ru` language pair. It turned out that the direction of the downloaded dataset is the opposite of the language codes: ``` (base) admin…

eu9ene updated 5 months ago

tamlhp/deepfake-benchmark #4

Papers on Audio Deepfake Detection

Every Breath You Don't Take: Deepfake Speech Detection Using Breath https://arxiv.org/abs/2404.15143

tamlhp updated 3 weeks ago

ArneBinder/pie-datasets #99

`argmicro` converter for `TextDocumentWithLabeledSpansAndBin…

The conversion procedure, i.e. [this code](https://github.com/ArneBinder/pie-datasets/blob/main/dataset_builders/pie/argmicro/argmicro.py#L216-L260), should follow the description in the section "3. D…

ArneBinder updated 8 months ago

MaartenGr/BERTopic_evaluation #1

Installation help on Evaluation

Hi Maarten Thank you for contributing BERTopic to the world. I am an MA candidate in Linguistics and I am actually writing my thesis on BERTopic. Programmingwise, I am at an early-intermediate st…

dean-rahman updated 2 years ago

358 results for linguistics-dataset

358 results
for linguistics-dataset