Open mfreeman451 opened 8 months ago
What kind of data do we have access to, as the most suitable source would be data leaks from telecom companies.
What kind of data do we have access to, as the most suitable source would be data leaks from telecom companies.
I don't have access to anything special -- we've developed a bot that listens to IRC channels in real-time and writes data to a message queue, that gets processed by consumers and stored in our graph database and vector indice, allowing us to do queries using an LLM that are grounded by the property graph. We're trying to make everything modular and done through interfaces so it is easy to add support for other mediums or raw logs. This could be in the form of call detail records or really any kind of structured data thats a record of conversation, including e-mail, usenet if that was still such a thing, etc.
We need to be able to infer relationships or summarize conversations, but in order to do that, you need to be able to determine when conversations start and stop. There might be several ways to do this, here are a few:
What happens if Alice asks Bob about something early in the morning and Bob doesn't respond until later in the day? We need to be able to look through or beyond time intervals to determine if chunks of conversations should be linked.