Sadswefg / project_2_data_visualization

0 stars 0 forks source link

Peer Review - Group H #2

Open PhiSon1990 opened 2 months ago

PhiSon1990 commented 2 months ago

While sentiment analysis on social media data is not entirely new, the focus on regional sentiment trends within Vietnam appears relatively novel. However, it's essential to ensure that your approach offers unique insights or methodologies compared to existing sentiment analysis projects. For example, integrating sentiment analysis results from other social media platforms or incorporating machine learning models tailored to regional languages might provide deeper insights.

Questions for Clarification:

  1. How do you plan to address potential biases in sentiment analysis results, particularly concerning regional dialects or slang that might not be captured accurately by the VADER tool?

  2. Have you considered how to handle outliers or noise in the Reddit data, such as irrelevant posts or spam, to ensure the accuracy of sentiment analysis results?

Possible Feature Suggestions:

Temporal Analysis

Besides exploring sentiment variations across cities, consider adding a feature to analyze sentiment trends over time for individual cities. This feature would enable users to identify significant events or trends influencing regional sentiments.

Comparison Tools

Incorporate functionality to compare sentiment scores between cities or regions, allowing users to identify disparities and similarities in perceptions on specific topics.

ellynnhitran commented 2 months ago

Thank you for your insightful feedback and questions on our proposal. We appreciate your recommendations and have considered them carefully to further refine our project approach.

Addressing Questions: - Addressing Biases in Sentiment Analysis: To address potential biases in sentiment analysis results, particularly those arising from regional dialects or slang, we will evaluate state-of-the-art (SOTA) sentiment analysis models that might provide better accuracy and a more nuanced understanding of sentiments. These models could include BERT-based techniques specifically fine-tuned for the Vietnamese language.

- Handling Outliers or Noise in Data: Concerning outliers or spam in the Reddit data, we will implement robust data cleaning processes. This includes filtering out non-relevant posts, using spam detection algorithms to remove spam content, and applying statistical methods to identify and handle outliers.

Possible Feature Suggestions: - Temporal Analysis: We find your suggestion to add a temporal analysis feature very valuable. We plan to incorporate this feature into our interactive dashboard, allowing users to visualize how sentiments in individual cities evolve over time. This will enable the identification of sentiment trends related to significant events or changes in public opinion, enhancing the depth of our analysis.

- Comparison Tools: We also plan to include functionality for comparing sentiment scores between cities or regions. This comparison tool will be part of the interactive dashboard, allowing users to easily identify and visualize disparities and similarities in perceptions across different topics and regions.

Moving forward, we will integrate these recommendations into our project and enhance our proposal to achieve better results.

tienvu95 commented 2 months ago

Thanks anh @PhiSon1990 for the feedback,

Regarding your group response, a few things to note

Vietnamese language on the web has a lot of variants, ppl can type with or without accent (tone mark), so it might be a difficult problem. Plus ppl might also write in english or a mix of both

Spam detection in this context is also kind of hard or you might need to pay extra attention. For reddit, some posts might be written with care, high level of details but can also be spam, esp if the writer want to achieve certain political effect. So you might need to define what posts can be categorized as spam.