Open Twenkid opened 1 year ago
Speech Recognition datasets etc. https://ai.meta.com/blog/voxpopuli-the-largest-open-multilingual-speech-corpus-for-ai-translation-and-more/ https://arxiv.org/abs/2006.13979 https://ai.meta.com/blog/xls-r-self-supervised-speech-processing-for-128-languages/
Language Identification library: tested, use the small model
https://fasttext.cc/docs/en/language-identification.html https://huggingface.co/facebook/fasttext-language-identification
Common Crawl tools
https://github.com/facebookresearch/cc_net
Huge Dataset
https://github.com/togethercomputer/RedPajama-Data
...
https://arxiv.org/abs/2007.10310
Bulgarian POS-tagger and NER-tagger: Applied https://github.com/AMontgomerie/bulgarian-nlp
https://github.com/AMontgomerie/bulgarian-nlp/blob/master/examples/pos_example.ipynb https://github.com/AMontgomerie/bulgarian-nlp/blob/master/examples/text_annotator_example.ipynb
About the Named-entity tags: https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)
PHATGOOSE Repository
PHATGOOSE, which stands for Post-Hoc Adaptive Gating Over an Ocean of Specialized Experts, enables zero-shot generalization from specialized experts (eg PEFT modules) trained on diverse datasets by adaptively routing among them. It requires an additional, inexpensive training step of a gate in front of a frozen PEFT module for its corresponding task.
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
https://github.com/stanfordnlp/pyvene https://arxiv.org/abs/2403.07809
Depth map monocular ... depth estimation ... synthetic data, real data ...
Depth Anything V2
Lihe Yang1 Bingyi Kang2
†
Zilong Huang2
Zhen Zhao Xiaogang Xu Jiashi Feng2 Hengshuang Zhao1
‡
1HKU 2TikTok
†
project lead
‡
corresponding author
https://depth-anything-v2.github.io/
Note, 4.1.2023: During this research effort I've been browsing, reviewing, visiting and revisiting, studying a huge amount of articles, concepts,, linked by association during browsing etc. for feeding ideas etc. The best would be to put them in some special representation, DB, semantic network etc.
So far starting with one out of many hundreds or maybe a thousand (so far) - well, a general curiosity, starting from that seed. This is a research & development project on its own, automatic analysis and learning assistant, reading assistant and accelerator, cognitive accelerator etc. An unpublished "in-house" project and experimental application, called [Research] Assistant or ACS in short (Assistant C#) which is a playground and inspiration for ideas and developments in these directions of "Cognitive Acceleration". In a broader sense, any computer and software is such a tool, though.
Various Statistical Similarity methods: https://en.wikipedia.org/wiki/Semantic_similarity A blog on Question Answering etc.: https://queryunderstanding.com/