MycroftAI / mycroft-core

Mycroft Core, the Mycroft Artificial Intelligence platform.
https://mycroft.ai
Apache License 2.0
6.52k stars 1.27k forks source link

I have an Question about the data collection #2244

Closed Zeddesnetos closed 5 years ago

Zeddesnetos commented 5 years ago

I really want to test your software /use it but after i have read that Data like the IP address, and all voice are collected is questionable.. Im Ok with the collection of my system information (like OS, Hardware etc) for develop Mycroft but i cant imagine why it would help to have my ip address and all voice. i would be ok with collecting the voice after i say "Hey Mycroft" but all voice? is it really needed? and why is my data even send to third partys? is that really needed too? and if we speak about the IP address, why do you need to know my location? the time is okey, but why do you want to know where i life? "the website to which you go after leaving our Services" why is that needed too?

My question is, is there a way to disable the unrelevant data?

can someone help me understand why i need to give away ALL my information for this software?

Edit: This is not hate and or criticism these are just some questions im wondering of

krisgesling commented 5 years ago

Hi there, all valid questions.

First it's important to note that by default, we don't keep any of your data. Unlike all the other voice assistants you have to explicitly opt-in to the open dataset for us to retain this. We very much appreciate those that do as this data is how we are able to improve the service, however we also understand that many people are not willing to share this and that is 100% ok.

I can assure you that Mycroft only starts recording after it detects the wake word, and from then records a maximum of 10 seconds. This by default is sent via Mycroft's servers to an STT service for transcription and the text is returned to your device. Currently we use Google's STT but are continuing to work with Mozilla to improve their offering so that we can switch to it when the performance meets the needs of our Community. The response text by default is sent to Mycroft's Mimic2 server, as the less robotic sounding voices require significantly more processing power than most people have in order to produce the audio in a reasonable time frame.

Both of these defaults can be changed to operate locally. The Text-To-Speech can easily be done on device using the British Male voice. To achieve Speech-To-Text is more difficult and requires reasonably good hardware to run something like Kaldi.

We need to use your IP address for our servers to communicate with your device. If you run a completely local setup using the Personal Backend then it wouldn't need to connect with Mycroft's servers at all.

The only location data we use is what you enter into Home.mycroft.ai and this is kept at the city level. It is used to provide things like the weather and time.

In terms of the section in our privacy policy called "Other Information We Collect Automatically Through Our Services", these are all the things that web servers generally collect automatically, so we need to include them as well. This is data that any website you visit would likely collect.

Zeddesnetos commented 5 years ago

Thanks, krisgesling thats all what i wanted to know. Thank you :D