Closed evgenydmitriev closed 3 years ago
In GitLab by @ngans20 on Jul 24, 2019, 20:31
changed the description
In GitLab by @ngans20 on Jul 24, 2019, 20:34
changed the description
In GitLab by @anshlykov on Jul 24, 2019, 22:37
@ngans20
- Any Top Writer badges
Can you give an example?
NLP events (to go through our nlp module)
- Publications (with number of claps and number of comments)
- comments
We cannot integrate this data through the new NLP module right now. I think we can do it in another task later.
In GitLab by @ngans20 on Jul 24, 2019, 22:42
- Any Top Writer badges
@zfinzi told me to include this. Zach can you share thoughts?
We cannot integrate this data through the new NLP module right now. I think we can do it in another task later.
Okay, sounds good. I'll slim this one down to just Agent metadata then!
In GitLab by @ngans20 on Jul 24, 2019, 22:42
changed the description
In GitLab by @ngans20 on Jul 24, 2019, 22:45
changed the description
In GitLab by @zfinzi on Jul 25, 2019, 14:23
changed the description
In GitLab by @zfinzi on Jul 25, 2019, 14:25
changed the description
In GitLab by @anshlykov on Jul 29, 2019, 10:48
Zach can you share thoughts?
@zfinzi
In GitLab by @zfinzi on Jul 29, 2019, 13:22
@anshlykov @ngans20 sorry for the delayed response. It is a designation by Medium for the most influential writers within a given topic, should be found on a medium account page. Here is a description.
I don't think it is the most important data point, but I see no harm in integrating this data if it can easily be pulled.
In GitLab by @anshlykov on Jul 29, 2019, 19:31
Now everything is clear, thank you. I will send this issue to my buddy.
In GitLab by @anshlykov on Jul 29, 2019, 19:32
changed title from Medium Source Integration to Medium Source Integration{+ - $250+}
In GitLab by @dima.sazhin on Aug 6, 2019, 13:21
@ngans20 @zfinzi @anshlykov
1.
Lists of accounts following & followed by
Do you mean the list of ids?
2.
Publications (that the user can edit)
Do you going to do it in another issue or not? https://gitlab.com/IncaOutsourcing/bounty/issues/33#note_195496593
In GitLab by @dima.sazhin on Aug 6, 2019, 13:35
assigned to @dima.sazhin
In GitLab by @zfinzi on Aug 6, 2019, 18:12
@dima.sazhin Thank you for claiming this bounty! Here are some answers to your questions:
We're looking for a list of usernames.
For publications, (this is related to the 3rd question) it would be great to know if a given user is an editor or writer for a publication. The comment you referred to is in relation to article posts and comments (that will require a separate module).
We need both the people from the blog and the blog, specifically it would be good to have information on the editors and writers of a blog (or publication).
In GitLab by @evgenydmitriev on Aug 6, 2019, 18:58
To clarify
We're looking for a list of usernames.
Unique IDs that cannot be changed are the priority in any data source, display usernames are useful, but are less important. Not sure how it works in Medium though.
In GitLab by @dima.sazhin on Aug 8, 2019, 14:20
I will collect ids if you don't mind.
I want to clarify one thing. Are you sure you want to store all relations to other objects in the same object?
In GitLab by @ngans20 on Aug 8, 2019, 15:49
@zfinzi & @evgenydmitriev - you guys know better than me here
In GitLab by @evgenydmitriev on Aug 8, 2019, 15:57
I will collect ids if you don't mind
Yes, IDs are the priority. We still need readable names as an extra field though, especially if the user can change them over time.
Are you sure you want to store all relations to other objects in the same object?
I don't see any other options. Those relationships describe the object and are unique to it. In terms of additional processing and potential splitting into other objects, this will be done by other modules down the CDC pipeline. Feel free to suggest other approaches though.
The main purpose of a source component is to periodically collect and normalize all external information so other modules don't need to talk to the outside world.
In GitLab by @evgenydmitriev on Aug 12, 2019, 18:08
changed the description
In GitLab by @evgenydmitriev on Aug 12, 2019, 18:10
I modified the requirements to make things easier:
Lists of accounts following (we don't need the followers)
Also, not a hard requirement, but I agree with @dima.sazhin that sending agent connections in separate messages is a more scalable way of doing things.
In GitLab by @anshlykov on Aug 26, 2019, 18:05
@ngans20 @zfinzi Test run of @dima.sazhin work. Please check
In GitLab by @ngans20 on Aug 26, 2019, 18:48
The sourcetype=agent
events look great to me!
I will defer to @zfinzi for sourcetype=relation
- but the structure looks fine to me. A couple questions associated with the content.type
field:
content.type
="un-followed" or "not following", or will there just stop being events with this "following" relation?In GitLab by @zfinzi on Aug 26, 2019, 22:27
So far the event structure looks good to me.
@dima.sazhin Are you planning on also generating events from blog (publication) profiles as well? It would be good to know which medium agents are editors for relevant medium blogs.
Zach - Are there any other types of relations besides following that could also be used in these events?
In terms of events related to two medium accounts, I can only think of the content.type
as following
. If we have data on blogs or publications then there could be an additional content.type
such as editor
or subscriber
connecting a publication and a medium account.
In GitLab by @zfinzi on Aug 27, 2019, 04:33
I quickly threw the event schema for the this together to standardize across all events storing follower data. Can be found here.
Events of sourcetype=relation
can be differentiated by the information_source
object.
In GitLab by @zfinzi on Aug 27, 2019, 04:35
@dima.sazhin one quick clarification. In the sourcetype=relation
event, the content.source.username
is following the content.target.username
. Is this correct?
In GitLab by @zfinzi on Aug 27, 2019, 04:59
@dima.sazhin can you restructure your sourcetype=agent
events to follow the agent standard listed in Stoplight. You will need to follow the ABM Agent schema.
You do not need to include fields that do not apply to your source
revenues
field and therefore it does not need to be added.For fields that do overlap with the schema you will need to rename them according to the ABM Agent structure
authorTags
will need to be called tags
aliases
agent_type
with a uniform value of person
. If you are adding blogs/publications they will also need an agent-type
but set to organisation
is fine for now.For all fields that are specific to Medium I have created a specific object called ABM Medium Account which will be passed into the agent_specific_attributes field
.
id
needs to go under medium_id
username
stored in aliases
and under agent_specific_attributes.username
Sorry for not providing this formatting sooner, let me know if you have any questions.
In GitLab by @anshlykov on Aug 30, 2019, 16:41
At the moment we pull all following relations. If you need changes, we can implement it in the future, just describe what you need somewhere. For example, create a discussion somewhere in Yupana
In GitLab by @anshlykov on Aug 30, 2019, 16:42
Correct.
In GitLab by @anshlykov on Aug 30, 2019, 16:59
@zfinzi
Why are you creating an additional level of data nesting? I think the object agent_specific_attributes
only complicates the work with the data.
Why rename the id
to medium_id
?
https://gitlab.com/IncaSec/nterminal/cdc/sources/github-source/merge_requests/30#note_210165294
You can create
id
as you have done so in the past, the concatenation ofinformation_source.name
andfull_name
.
If you want the id field to be calculated this way, we will have a few problems.
In GitLab by @zfinzi on Aug 30, 2019, 17:13
Why are you creating an additional level of data nesting? I think the object
agent_specific_attributes
only complicates the work with the data.
To apply a standard for agent data. It makes it far easier for documenting and comparing events across sources.
Why rename the
id
tomedium_id
?
id
and medium_id
are different but could have the same names if medium_id
is nested in agent_specific_attributes
.
They are different values as well, id
is internally generated while medium_id
is created by medium. Both fields should be in the event.
If you want the id field to be calculated this way, we will have a few problems.
I agree that this is a bad method for doing this, I referred to previous documentation on this one to give Yulia clarity because I had no alternative thought up yet.
In GitLab by @anshlykov on Aug 30, 2019, 17:47
You use data composition, and there is a second method - data inheritance. The documentation in swagger supports both methods. I don't fully know how you're going to use the data, but at this point, I'd rather inherit data. Maybe I'm wrong.
I propose these changes:
agent-medium
, agent-twitter
, agent-whatever
... agent-yupana
). All of these types are based on one common typeid
is the user ID from the social media that we monitor.agent-yupana
you will aggregate data from different sources as you want and will assign them an id unique to our systemIn GitLab by @zfinzi on Aug 30, 2019, 18:11
I agree with this breakdown, and this was the original idea. One agent event would only contain a single agent_specific_attribute
, such as medium, twitter, github.
Common or inherited fields are all that exist outside of agent_specific_attributes
object, these include: industry
, tags
, full_name
ect.
Then these can be brought together using the inherited values or aggregators such as Messari & ICOholder to create a unified agent-yupana
event with all unique child fields.
If you think there is a better way to format this on Spotlight, I will rework the event structures.
In GitLab by @anshlykov on Jan 7, 2020, 16:10
closed
In GitLab by @ngans20 on Jul 24, 2019, 20:31
Bounty Description
deployment::ready
tag in your merge request.Important Information/resources
Functionality
Agent Metadata
General rules
Background
Inca often uses "bounty projects" as introductory projects to vet potential employees or interns. These projects give interested individuals a chance to prove themselves, learn a bit about our company & products, and produce a useful result in the process. These projects are extremely independent and will require you to manage your own time and work process.
NTerminal is a data aggregation and analytics platform used for navigating the crypto-financial ecosystem. NTerminal's many data streams can be categorized into three general segments:
Resources
Don't hesitate to ask us questions by commenting in this issue or emailing us at bounty@incasec.com.