Closed pjaol closed 1 month ago
@pjaol Great issue / proposed solutions! I've also sent this to our legal council to help us navigate it and make sure we are complying to any requirements, I'm expecting to hear back from them in the next 7 - 15 days so we can prioritize any necessary work :)
Great @joaomdmoura - appreciate it.
There is a significant number of people from the issues linked, and it looks like from the discord who believe telemetry should be opt-in. That's your call, in comparison to several other products I encourage you to offer similar options to disable all telemetry.
Chroma https://docs.trychroma.com/telemetry ANONYMIZED_TELEMETRY = FALSE
Langchain https://docs.smith.langchain.com/old/tracing/quick_start LANGCHAIN_TRACING_V2 = False
I'm scheduled to open a CVE on it next week, but will hold off until August 30th
Thanks for pointing it out, I'm waiting from legal to understand better actual requirements given it's opensource, so we do the right thing.
You do can disable telemetry OTEL_SDK_DISABLED=true
, so maybe we just need to better document it. :)
Just so I understand is the idea behind the CVE about disabling telemetry or #1177 ? I assume it's only #1177 but want to double check so I pass all correct information along
Thanks again, appreciate the work
The CVE would be both, although I know I'll be asked to probably split it into two.
As a client IP is defined as a personal identifier, I don't see any way around that other than having the ability to disable telemetry and is in active breach.
OTEL_SDK_DISABLED=True
I'll do some testing with that, it looks like it returns a NoOpTracer from open telemetry - but definitely that needs to be clear and unambiguous in documentation
It could contain all or any of https://user:password@secret_project.somewhere.com:12345/next_iphone_model_ai_v0.1
Appreciate you looking at this!
As a client IP is defined as a personal identifier, I don't see any way around that other than having the ability to disable telemetry and is in active breach.
Once we remove the BASE_URL though I think there wont be any IP being collected, right? So there would be no breach? I think all the other data points are generic. Will also ask council about this specifically and send them our docs with the list of collected data points, but thanks for going over it with me!
Oh now that I think about it people could add IP into a model name even if doesn't include the url. But yup, given we already offer a way to disable it I think it's just a matter of better docs.
I think there's a couple of issues
Documenting the explicit ability to turn off telemetry , currently the wording is "We don't offer a way to disable it now, but we will in the future." So obviously change that, providing
OTEL_SDK_DISABLED=true
The order of documenting what's collected, right now stating we're not collecting private data unless... and here's the data we are collecting is definitely ambiguous. If you are using US council they will tell you it's the implementors responsibility to read the whole document and understand it. EU lawyers will tell you that's why the wording used is "unambiguous". It's fun being on those calls with specialized outside council
Just my opinion but a simple table of default collected data, optionally shared data makes it explicit and clear, including things like the output and if human input is included.
Defaulted | Data | Reason |
---|---|---|
Yes | Version of CrewAI | Assessing the adoption rate of our latest version helps us understand user needs and guide our updates. |
Yes | Python Version | Identifying the Python versions our users operate with assists in prioritizing our support efforts for these versions. |
Yes | General OS Information | Details like the number of CPUs and the operating system type (macOS, Windows, Linux) enable us to focus our development on the most used operating systems. |
Yes | Number of Agents and Tasks in a Crew | Ensures our internal testing mirrors real-world scenarios, helping us guide users towards best practices. |
Yes | Crew Process Utilization | Understanding how crews are utilized aids in directing our development focus. |
Yes | Memory and Delegation Use by Agents | Insights into how these features are used help evaluate their effectiveness and future development. |
Yes | Task Execution Mode | Knowing whether tasks are executed in parallel or sequentially influences our emphasis on enhancing parallel execution capabilities. |
Yes | Language Model Utilization | Supports our goal to improve support for the most popular languages among our users. |
Yes | Roles of Agents within a Crew | Understanding the various roles agents play aids in crafting better tools, integrations, and examples. |
Yes | Tool Usage | Identifying which tools are most frequently used allows us to prioritize improvements in those areas. |
No | Goal (Opt-In) | Part of detailed crew and task execution data, enabling deeper insight into usage patterns. |
No | Backstory (Opt-In) | Part of detailed crew and task execution data, providing context for task execution and improving user experience. |
No | Context (Opt-In) | Part of detailed crew and task execution data, essential for understanding how tasks are set up and executed. |
No | Output (Opt-In) | Part of detailed crew and task execution data, offering insights into the final results of task execution. |
No | Human Input (Opt-In) | Captures whether human input was required during task execution, helping to improve human-agent interaction mechanisms. |
No | Agent Verbosity (Opt-In) | Indicates whether agents were set to verbose mode, providing insights into detailed logging and communication preferences. |
No | Max Iterations (Opt-In) | Records the maximum number of iterations allowed for agents, aiding in the analysis of task complexity and agent efficiency. |
No | Max RPM (Opt-In) | Tracks the maximum RPM (Requests Per Minute) settings, useful for understanding performance constraints and resource allocation. |
No | Tools Names (Opt-In) | Identifies the specific tools used by agents during tasks, helping to prioritize tool development and support. |
No | Tool Reuse (Opt-In) | Logs repeated usage of tools by agents, helping to identify potential areas for tool optimization or additional support. |
No | Task Description (Opt-In) | Logs the description of each task, providing context for understanding task objectives and expected outcomes. |
No | Expected Output (Opt-In) | Captures the expected output for tasks, useful for comparing against actual outcomes and measuring task success. |
No | Task Output (Opt-In) | Records the actual output of tasks, enabling the assessment of task completion and quality of results. |
Commits added, they are going out on the next version, probably cutting later today or over the weekend.
After clearing with legal we:
base_url
attribute.base_url
, what we are collecting is not considered personal information under GDPR nor PII.This is great really appreciate it! The last bullet point
Confirmed that data localization is not a problem due to the fact this is not considered personal information under GDPR
As long as IP's are not retained then you should be clear on PII Consumers just need to be informed about locality, that can be a note in the docs.
Docs updated, new verison with not llm url cut :)
During a review of CrewAI, we identified issues regarding the telemetry data collection and transfer processes, which may not fully comply with GDPR requirements.
Telemetry Data Transfer:
Telemetry data is currently being collected and transferred to https://telemetry.crewai.com, a location outside the EU. While CrewAI has made an effort to categorize the telemetry data and provide transparency in their documentation, several potential issues under GDPR still need to be addressed.
Potential GDPR Concerns:
Example: Desktop Implementations: User interactions could potentially be linked to a user through IP addresses or other unique identifiers, making the data personal under GDPR, even if the share_crew flag is not enabled.
Issues Identified:
Ambiguity in Documentation:
No Opt-Out Mechanism:
Transparency and Consent:
Recommendations for Improvement:
Data Review and Classification:
Explicit Consent Mechanism:
Transparency and Documentation:
Legal Safeguards for Data Transfer:
Related to :
1177 #266 #372 #241