ResearchSoftwareInstitute / greendatatranslator

Green Team Data Translator Software Engineering and Development
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Explore slack API to retrieve and mine team interaction data for team science visualization #84

Closed hyi closed 6 years ago

hyi commented 7 years ago

As suggested by @StanAhalt, I will explore slack web API to retrieve and mine team interaction data for data translator project for some additional team science visualization. The target date for getting some preliminary slack-based team science visualization is Sept 7 NCATS council meeting where @StanAhalt is invited to give a talk. I don't think there is a milestone for this issue. @KCB13 @karafecho feel free to assign one or chime in from project management perspective.

hyi commented 7 years ago

Have retrieved reaction data from slack web API and created the following visualization: slackviz_1 Next I plan to explore slack API to retrieve threaded messages as another type of interaction data, and add it to the visualization, possibly differentiated as another type of interaction as opposed to the reaction interaction data. Last, if it has value, can explore a bit to mine the messages and derive interactions based on message contexts, which would be exploratory.

hyi commented 7 years ago

Have retrieved threaded messages from slack web API and updated visualization to include two communication types of messages: threaded messages, and reaction messages. See screen capture below for the current visualization: slackviz_1 Green colored links represent threaded message communications, and gray colored links represent reaction message communications. These messages are retrieved from all public channels in translator team in slack, so they don't include private channels, group messages, and direct messages. As you can see, there are not a lot of communications going on slack, so I am not sure whether it has value to show this visualization as part of team science. I could go further to explore all messages and mine the data to learn and discover potentially more interactions between messages, but I don't think that would add much to the current visualization. I could look into private channels, group messages, and direct messages next, but want to get feedback at this point whether the current visualization would be useful for Sept. 7 meeting before I spend more time to explore this further.

StanAhalt commented 7 years ago

Hong, this is awesome!

-- Stanley Ahalt, Ph.D. Director, Renaissance Computing Institute Professor, Department of Computer Science, UNC-CH Assoc. Director of Biomedical Informatics Service, NC TraCS

For appointments, contact Asia Mieczkowska jomiecz@renci.orgmailto:jomiecz@renci.org, ph# 919.445.9641

From: Hong Yi notifications@github.com Reply-To: ResearchSoftwareInstitute/greendatatranslator reply@reply.github.com Date: Friday, July 28, 2017 at 4:35 PM To: ResearchSoftwareInstitute/greendatatranslator greendatatranslator@noreply.github.com Cc: Stanley Ahalt ahalt@renci.org, Mention mention@noreply.github.com Subject: Re: [ResearchSoftwareInstitute/greendatatranslator] Explore slack API to retrieve and mine team interaction data for team science visualization (#84)

Have retrieved threaded messages from slack web API and updated visualization to include two communication types of messages: threaded messages, and reaction messages. See screen capture below for the current visualization: [lackviz_1]https://user-images.githubusercontent.com/476302/28734875-e4e24914-73b0-11e7-8a48-d09cb257943d.png Green colored links represent threaded message communications, and gray colored links represent reaction message communications. These messages are retrieved from all public channels in translator team in slack, so they don't include private channels, group messages, and direct messages. As you can see, there are not a lot of communications going on slack, so I am not sure whether it has value to show this visualization as part of team science. I could go further to explore all messages and mine the data to learn and discover potentially more interactions between messages, but I don't think that would add much to the current visualization. I could look into private channels, group messages, and direct messages next, but want to get feedback at this point whether the current visualization would be useful for Sept. 7 meeting before I spend more time to explore this further.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/84#issuecomment-318755588, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH0E6eVbu53lUjAv1ARNuamcs1kJCx9cks5sSkX8gaJpZM4OaQMR.

StanAhalt commented 7 years ago

Hong,

Can we have a chat about this tomorrow? I want to try to understand what the possibilities are before you expend more effort.

Thanks!

Stan

-- Stanley Ahalt, Ph.D. Director, Renaissance Computing Institute Professor, Department of Computer Science, UNC-CH Assoc. Director of Biomedical Informatics Service, NC TraCS

For appointments, contact Asia Mieczkowska jomiecz@renci.orgmailto:jomiecz@renci.org, ph# 919.445.9641

From: Hong Yi notifications@github.com Reply-To: ResearchSoftwareInstitute/greendatatranslator reply@reply.github.com Date: Friday, July 28, 2017 at 4:35 PM To: ResearchSoftwareInstitute/greendatatranslator greendatatranslator@noreply.github.com Cc: Stanley Ahalt ahalt@renci.org, Mention mention@noreply.github.com Subject: Re: [ResearchSoftwareInstitute/greendatatranslator] Explore slack API to retrieve and mine team interaction data for team science visualization (#84)

Have retrieved threaded messages from slack web API and updated visualization to include two communication types of messages: threaded messages, and reaction messages. See screen capture below for the current visualization: [lackviz_1]https://user-images.githubusercontent.com/476302/28734875-e4e24914-73b0-11e7-8a48-d09cb257943d.png Green colored links represent threaded message communications, and gray colored links represent reaction message communications. These messages are retrieved from all public channels in translator team in slack, so they don't include private channels, group messages, and direct messages. As you can see, there are not a lot of communications going on slack, so I am not sure whether it has value to show this visualization as part of team science. I could go further to explore all messages and mine the data to learn and discover potentially more interactions between messages, but I don't think that would add much to the current visualization. I could look into private channels, group messages, and direct messages next, but want to get feedback at this point whether the current visualization would be useful for Sept. 7 meeting before I spend more time to explore this further.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/84#issuecomment-318755588, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH0E6eVbu53lUjAv1ARNuamcs1kJCx9cks5sSkX8gaJpZM4OaQMR.

hyi commented 7 years ago

I tried to use slack web API to retrieve messages from private channels or groups in translator team, but get an empty list. I also tried to retrieve direct IM messages, but only got slack bot to myself direct IM messsage, so either we don't have private channel messages and direct IM messages in translator team, or I don't have permission to read them. So looking into public channels in our team as I currently did appears to be the only data source we can leverage.

I can further explore the two items below if you think it is a good idea to do so:

hyi commented 7 years ago

Have pulled all message data from slack and incorporated all data into visualization. Will fix a few bugs I found along the way, then email the team for feedback this afternoon or tomorrow.

hyi commented 7 years ago

Have removed singleton nodes from visualization and adjusted force-directed layout parameters to make the initial graph layout look good when showing the site without needing manually adjusting layout. Have also extracted keywords from all messages using a machine learning library. Will work on visualizing these keywords in word clouds in JavaScript next.

hyi commented 7 years ago

Have also looked into slack messages API and found although there is a ts field that stands for time stamp, it is not really a real time, but rather an internal string (e.g., 1358546515.000008) that is used to order messages for paging through messages or showing a list of messages in order. After giving this some more thought, I think showing a longitudinal message line would not add much value given that a node or link can be clicked to see messages posted by the node/person or list of communication messages that link two nodes/persons, and given that a word cloud of key words across all messages will also be visualized. There may be some interesting ways to visualize messages over time in our case, but given the deadline, I don't think I have time to investigate and create some value-add visualization for this, so I'll just focus on adding word cloud visualization and make it work well with the graph visualization to get the team science message across.

StanAhalt commented 7 years ago

Sounds good. Thanks!

From: Hong Yi [mailto:notifications@github.com] Sent: Monday, August 21, 2017 11:50 AM To: ResearchSoftwareInstitute/greendatatranslator greendatatranslator@noreply.github.com Cc: Stanley Ahalt ahalt@renci.org; Mention mention@noreply.github.com Subject: Re: [ResearchSoftwareInstitute/greendatatranslator] Explore slack API to retrieve and mine team interaction data for team science visualization (#84)

Have also looked into slack messages API and found although there is a ts field that stands for time stamp, it is not really a real time, but rather am internal string (e.g., 1358546515.000008) that is used to order messages for paging through messages or showing a list of messages in order. After giving this some more thought, I think showing a longitudinal message line would not add much value given that a node or link can be clicked to see messages posted by the node/person or list of communication messages that link two nodes/persons, and given that a word cloud of key words across all messages will also be visualized. There may be some interesting ways to visualize messages over time in our case, but given the deadline, I don't think I have time to investigate and create some value-add visualization for this, so I'll just focus on adding word cloud visualization and make it work well with the graph visualization to get the team science message across.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/84#issuecomment-323781335, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH0E6e4H-SKoNqtqd5Q5QfBrq04hdOIRks5saacMgaJpZM4OaQMR.

hyi commented 7 years ago

Updated the visualization to address feedback: (1) used grayscale for word cloud to remove confusion with team colors; (2) added team member list link in the top paragraph above the visualization. The new URL for the visualization is: https://xdciviz.renci.org/translator/teamscience/slackviz/

hyi commented 7 years ago

Just chatted with Stan, and will set up a cron job to pull slack data every midnight and save data for each run so that we can look at data over time in the future.

hyi commented 7 years ago

Have set up the cron job to pull slack data every midnight, and now we have 11 time-stamped data snapshots. Will investigate visualizing data over time in interesting ways. May have to derive the difference between two consecutive time-stamped snapshots in order to visualize the difference.