NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.5k stars 2.41k forks source link

How to get the data_process code in NeMo doc for common crawl? #7914

Closed SefaZeng closed 9 months ago

SefaZeng commented 9 months ago

Is your feature request related to a problem? Please describe.

I read the doc for NeMO about data processing here, and I want to use these scripts for data preprocessing. But I can't find a way to get them.

Describe the solution you'd like

A website or a github URL to download all these scripts.

nithinraok commented 9 months ago

@SefaZeng there is a issue with links, its not open sourced through NeMo. Currently you can request for access to NeMo framework at https://developer.nvidia.com/nemo-framework and run scripts

ericharper commented 9 months ago

@SefaZeng closing this issue, feel free to reopen if you are unable to run the scripts after downloading the nemo-framework container.

SefaZeng commented 9 months ago

@SefaZeng there is a issue with links, its not open sourced through NeMo. Currently you can request for access to NeMo framework at https://developer.nvidia.com/nemo-framework and run scripts

Thank you for your reply. Could you please tell what is the difference between NeMo and NeMo-Framework?

nithinraok commented 9 months ago

(As of current) NeMo is open source conversational toolkit for training ASR, TTS, NLP and Speaker Diarization models. NeMo framework is a open access container based framework for building, customizing and deploying LLM/multi-modal models.