NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.1k stars 2.52k forks source link

Can I use NeMo without a cuda gpu? #3544

Closed burgil closed 2 years ago

burgil commented 2 years ago

Is your feature request related to a problem? Please describe.

Can I use NeMo on a gpu without cudas in it?

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

As a guy who does not have access to a Nvidia gpu I always find it hard to work with ai related projects, I understand that the gpu is so much faster than the cpu that it can run some tasks better but it would be really awesome if I could also get to experience it without having to buy an rtx for the price of my rent x 2, just saying facts I love Nvidia no matter how much it costs I just cant afford it yet and would like to still go into production with some of this cool new libraries like nemo whether or not I need to pay fees or legally can do it doesn't matter, what's matter is making stuff that matter like bringing voice recognition to everybody life in new ways by as many people as we can.

To be extremely specific I mean that I want to be able to use those ai projects like NeMo on the client side to do tasks like ASR (Automatic speech recognition) without having to use all or some of a big gpu power, I understand we need big gpus to train the models but us developers just want to be able to bring speech recognition in the same level of accuracy as in Nvidia gpus on the cpu or ram or whatever with slower interactions but the same level of accuracy or some way for us to interact with nemo on like a laptop, phone browser, or windows pc, offline, simply speech recognition

What I noticed is that in deepspeech for example the level of accuracy is like super poor if you don't have Cuda, and what I ask is why, cant we do better? cant we have it so it will take a little longer but have the same level of accuracy? in speech recognition that will be a lower WER Why cant we have nvidia libraries work the same but take more time to execute on the cpu

Describe the solution you'd like

I would like to be able to truly transfer the cuda computations into the cpu or ram with like a level of caching or whatever so it will run the same on computers with bad gpus without having to host it on a machine that has Nvidia gpus attached to it, the whole point of training a large model into a small "smart" file is that it will work without requiring like 64GB GPU to run it, and we prove that we can, I know this, I know we can, if by now you think I don't know that we can then you misunderstood me, the problem is that it works poorly, What I'm saying or claiming is that the same code running on a Nvidia gpu will not only return the answer faster from a pretrained model but also will have higher accuracy of detection, and I think that that's a messed up math some where

A clear and concise description of what you want to happen. I want to be able to bring voice driven apps in all mediums shapes sizes and devices without paying a billion dollars for hosting and possibly having it work fast enough on offline devices just to prove the point it is not requiring an external server overhead time, and by saying that it needs to work offline I simply mean that it runs locally and its therefor faster and better in many ways. I think humans in 2022 deserve that, and developers are the key to bring that, please is all I have to say about that.

Provide a code snippet on how new APIs/changes would be used by others.

import ASR from NeMo

while True:
  transcription = ASR(lang='auto')
  print(transcription)

The above example showcase the Perfect World example, the lang auto -> can even be an optional variable, the point is that that easily we will be able to make our apps have speech recognition, it will, unless configured otherwise, find the default microphone, cache any models into the ram, listen for speech, detect silence, process voice chunks in real time, normalize and remove silence, predict the text, optionally add a callback param here for real time output into another function or thread from the main while loop, and then also run it through a grammar corrector, a language model, and return finally the output for us to do simply > if txt contain search blah then scroll to text blah this alone is a very powerful feature to have on many applications, it can be the future we were all waiting for, imagine a world where you can talk with everything you get, buy, or have since you open sourced this awesome ASR tech lol

Describe alternatives you've considered You name it I tried it, vosk cmusphinx ibm watson google cloud amazon transcriptions hmmmm python sr, deepspeech, anyways I basically googled speech recognition 10 years ago, every day, ever since then, following every link I could find from the never growing results.

A clear and concise description of any alternative solutions or features you've considered. I usually avoid any ai related stuff and try to reinvent it on the cpu since none of you guys libraries work on my low spec pc and I cant afford to upgrade it yet

Additional context I really just want to be able to do speech recognition in English with the same level of accuracy that exists today on my pc which have 24gb ram i5 cpu and 2gb amd gpu and I also want to compile this later with pyinstaller and send it around or whatever so other people can talk to their pc and enjoy it and maybe some companies will even make your toaster or fridge talk one day using this tech which is beyond me and you this is good for the future of man kind lol a talking fridge is not really that good but a talking pc is. (lol plz don't steal my idea)

Add any other context or screenshots about the feature request here. Imagine a screenshot of a guy almost falling a sleep near the tv and then he says turn off the movie and it does lol its a video by now but you get the point that's like a comic book meme screenshot lol

okuchaiev commented 2 years ago

You can use some of the NeMo models without GPUs. Please refer to this minimal example, I've just posted (and tested that it works on my laptop without GPU) https://github.com/NVIDIA/NeMo/discussions/3553

burgil commented 2 years ago

@okuchaiev Although your code snippet was extremely useful and I also found out that Windows is not officially supported by the NeMo library, My question remained, I'm really curious about why cant we have nvidia libraries work the same but take more time to execute on the cpu? (by work the same I mean have the same accuracy) Or does it already work the same and just some conditions affected my tests? (referring to most if not all libraries out there) How much slower will it be if in theory we ran the final pretrained model through the CPU or other parts of the pc than a Cuda compatible GPU? Is it simply not supported by those libraries to do those computations on the CPU? can I in theory do that somehow like you did but without loosing accuracy?