daswer123 / xtts-webui

Webui for using XTTS and for finetuning it
MIT License
549 stars 108 forks source link

Need of Internet / dubious connections / spyware #101

Open IVIaV opened 5 days ago

IVIaV commented 5 days ago

I find it a bit strange that the internet is used all the time. Especially with so many IPs... Can someone explain why?

In the data I could find out that e.g. Google is used (I assume for their language model?)

Anyway, these are all the connections that are there: I find the ones with an “!” strange:

In addition, the following pages are also communicated when the browser is opened:

Don't take offense, but I'm being very careful here. In this project, various things are still installed after scripting... all in all, it seems too confusing for me to deal with. Perhaps the creator can shed some light on this?

grafik

AnonimusJack commented 4 days ago

First of all let me applaud you for your research here. Next, I'll put some of your concerns at ease and raise others.

HuggingFace with LFS is probably used for the models and related things. The GradioAPI is for Gradio telemetry data, collected by Gradio (for what purpose? most likely bug tracking of something, but who knows~). httbin.org is generally used for testing, though a thorough dive into the code can uncover other uses :eyes: checkip.amazonaws.com used by the Gradio telemetry api to get your IP. geolocation.onetrust.com probably more telemetry, by whom will require some deep dive. ip.taobao.com the Chinese AWS, what is hosted there will be interesting, also requires a deep dive into the code.

I'll check these out and comeback with insights. It seem the OP is MIA for an entire quarter so probably we won't receive input from him. But I'm curious, and before I sacrifice time for my own tool instead of this one I'll make sure it's safe.

Thanks again for bringing this to my attention. Cheers~

IVIaV commented 3 days ago

Hey thx for the informations! :) I found out some more info using Wireshark... But first a few comments and questions on what I know so far:

HuggingFace: But why do I need all the time a connection to huggingface, if I installed all modells 🤔

GradioAPI: Yeah, gradioapi ist strange... but no proof of misuse...

httbin.org: No answer but more information

httpbin is a popular online service that provides a simple HTTP request & response service for testing and debugging HTTP libraries and clients. Some key things you can do with httpbin from proxiesapi.com

checkip.amazonaws.com: For what does it need my IP?

ip.taobao.com: I tracked the IP behind it. Seems to be a Service from Alibaba.com ... but no clue for what :/


New Information: I used Wireshark to look at all http connections (really only http, as there is too much traffic over my machine). These companies are always retrieved at the start of finetune or xtts webui (I'm in Germany, therefore Akamai in DE)

Akamai Technologies Inc.    DE --> "OCSP 544 Request"
EU Metro Frontend       IR --> "OCSP 540 Request"  
Edgecast Inc            US (California) 3 times reported, but no evidence of malware... just a OCSP 544 Request
Amazon.com Inc.     US (Virginia) --> Probably IP as mentioned by AnonimusJack "OCSP 533 Request"
Google LLC          US (Missouri) --> "403 GET /success.txt?ipv4 HTTP/1.1" and "423 GET /success.txt?ipv6 HTTP/1.1"🤷 

What I find even more interesting is that during the process of cloning a voice from my own model, I occasionally get a Peer2Peer connection to a IP in China:

182.245.121.172 China Telecom Beijing !!! I have no proof that this IP is directly related to the XTTS WebUI or TTS !!!! But perhaps someone has observed similar behavior?

AnonimusJack commented 3 days ago

I found something more. It seems most of the "problematic" API calls come from this library: https://github.com/uliontse/translators

class Region(Tse):
    def __init__(self):
        super().__init__()
        self.get_addr_url = 'https://geolocation.onetrust.com/cookieconsentpub/v1/geo/location'
        self.get_ip_url = 'https://httpbin.org/ip'
        self.ip_api_addr_url = 'http://ip-api.com/json'  # must http.
        self.ip_tb_add_url = 'https://ip.taobao.com/outGetIpInfo'
        self.default_region = os.environ.get('translators_default_region', None)

From translators/server.py line 322. Seems kinda safe. I should check the project, maybe it can be turned off if there's no need to detect a language or do some translation for some peace of mind :grin: