Watts-Lab / team-process-map

MIT License
0 stars 4 forks source link

Politeness_v2 Feature #197

Closed kumarnik1 closed 2 months ago

kumarnik1 commented 3 months ago

Pull Request Template: If you are merging in a feature or other major change, use this template to check your pull request!

Basic Info

What's this pull request about?

Politeness_v2 feature using SECR module

3 added files:

Added to calculate_chat_level_features as well

Feature Documentation

Did you document your feature? Make sure you do the following before you pull request!

Code Basics

Testing

The location of my tests are here:

[PASTE LINK HERE]

If you check all the boxes above, then you ready to merge!

xehu commented 3 months ago

@kumarnik1 thank you for the changes! The code now runs, but it looks like some of the assertions from the test_politeness document are failing:

test_feature_metrics.py:27: AssertionError
================================================= warnings summary =================================================
../../../../../anaconda3/envs/tpm_virtualenv/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1641
test_feature_metrics.py::test_conv_unit_equality[1-conversation_rows0]
test_feature_metrics.py::test_conv_unit_equality[2-conversation_rows1]
test_feature_metrics.py::test_conv_unit_equality[3-conversation_rows2]
test_feature_metrics.py::test_conv_unit_equality[4-conversation_rows3]
test_feature_metrics.py::test_conv_unit_equality[5-conversation_rows4]
test_feature_metrics.py::test_conv_unit_equality[6-conversation_rows5]
  /Users/xehu/anaconda3/envs/tpm_virtualenv/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1641: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
  See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
    return np.find_common_type(types, [])

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================= short test summary info ==============================================
FAILED test_feature_metrics.py::test_chat_unit_equality[row5] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row6] - assert 0 == 2.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row10] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row11] - assert 0 == 2.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row12] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row13] - KeyError: 'Acknowledgment'
FAILED test_feature_metrics.py::test_chat_unit_equality[row14] - KeyError: 'Acknowledgment'
FAILED test_feature_metrics.py::test_chat_unit_equality[row18] - KeyError: 'indirect_(greeting)'
FAILED test_feature_metrics.py::test_chat_unit_equality[row20] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row53] - KeyError: 'Factuality'
FAILED test_feature_metrics.py::test_chat_unit_equality[row54] - KeyError: 'Direct_question'
FAILED test_feature_metrics.py::test_chat_unit_equality[row55] - KeyError: 'Hasnegative'
FAILED test_feature_metrics.py::test_chat_unit_equality[row56] - KeyError: 'Hasnegative'
FAILED test_feature_metrics.py::test_chat_unit_equality[row84] - KeyError: 'Haspositive'
FAILED test_feature_metrics.py::test_chat_unit_equality[row85] - KeyError: 'Haspositive'
FAILED test_feature_metrics.py::test_chat_unit_equality[row86] - KeyError: 'Subjunctive'
FAILED test_feature_metrics.py::test_chat_unit_equality[row87] - KeyError: 'Apologizing'
FAILED test_feature_metrics.py::test_chat_unit_equality[row90] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row91] - KeyError: 'Please_start'
FAILED test_feature_metrics.py::test_chat_unit_equality[row92] - KeyError: 'Hashedge'
FAILED test_feature_metrics.py::test_chat_unit_equality[row93] - KeyError: 'Hasnegative'
FAILED test_feature_metrics.py::test_chat_unit_equality[row94] - KeyError: 'Haspositive'
FAILED test_feature_metrics.py::test_chat_unit_equality[row95] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row99] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row101] - KeyError: 'Apologizing'
FAILED test_feature_metrics.py::test_chat_unit_equality[row103] - KeyError: 'Direct_question'
FAILED test_feature_metrics.py::test_chat_unit_equality[row104] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row105] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row107] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row108] - assert 0 == 2.0
==================================== 30 failed, 85 passed, 7 warnings in 1.48s =====================================

Would it be possible to check on some of these test cases? I added your tests into testing/data/cleaned_data/test_chat_level.csv.

xehu commented 2 months ago

@kumarnik1 see Slack comment:

I am using the test string in https://github.com/bbevis/SECR to confirm the outputs of politeness V2:

I understand your perspective and agree that I would not want to have resentment in the workplace against women, as that would further compound the issue we are looking at. I do think that it is true that women are underrepresented in STEM careers and am a believer that something should be done to address this discrepancy, even if that is not implementing a priority for women in hiring decisions. While I don\'t think that companies should explicitly hire simply because of their gender, I do think that they should be mindful of the gender gap in STEM and look to address those issues through their hiring practices.

Here is the output from SECR:

(SECR) xehu@WHA-ODD44VVQ-ML System % python3 feature_extraction.py
               Features Counts
0    Impersonal_Pronoun     12
1   First_Person_Single      5
2                Hedges      3
3              Negation      3
4          Subjectivity      3
5      Negative_Emotion      3
9             Reasoning      1
11            Agreement      1
10        Second_Person      1
37       Adverb_Limiter      1
8          Disagreement      1
6       Acknowledgement      1
7   First_Person_Plural      1
25               For_Me      0
36         WH_Questions      0
35      YesNo_Questions      0
34         Bare_Command      0
33    Truth_Intensifier      0
32              Apology      0
31           Ask_Agency      0
30           By_The_Way      0
29              Can_You      0
28    Conjunction_Start      0
27            Could_You      0
26         Filler_Pause      0
24              For_You      0
23         Formal_Title      0
22          Give_Agency      0
21          Affirmation      0
20            Gratitude      0
18                Hello      0
17       Informal_Title      0
16          Let_Me_Know      0
15             Swearing      0
14          Reassurance      0
13               Please      0
12     Positive_Emotion      0
19              Goodbye      0
38          Token_count    115

After testing the code on the Politeness V2 branch, I found 2 issues (one of which I was able to fix):

  1. I noticed that since the columns were being sorted alphabetically, but the features were being sorted by the Counts column, the values of the features ended up going to a different column than the one they were supposed to go to. For example, because "Acknowledgement" was the first column, it always got value of whatever feature had the highest Count. I resolved this by removing all calls to sort_values().
  2. There are different outputs on the branch depending on whether you call it on message (which is preprocessed to remove punctuation and make everything lowercase) versus message_original (which doesn't have any preprocessing). This is because, under the hood, SECR is doing more than just keyword searches --- it looks like it's also parsing the grammatical structure of the sentence, and using the punctuation to do so. This means that a question with a question mark, e.g, "what are you doing?" is parsed as a question, but without the question mark, e.g., "what are you doing," is NOT a question.

Here's the part I wasn't able to fix: despite running the code on message_original (NO preprocessing), I am not able to reproduce the outputs of SECR.

Here's the 3 test cases that are failing:

------TEST FAILED------
Testing Negation for message: I understand your perspective and agree that I would not want to have resentment in the workplace against women, as that would further compound the issue we are looking at. I do think that it is true that women are underrepresented in STEM careers and am a believer that something should be done to address this discrepancy, even if that is not implementing a priority for women in hiring decisions. While I don\'t think that companies should explicitly hire simply because of their gender, I do think that they should be mindful of the gender gap in STEM and look to address those issues through their hiring practices.
Expected value: 3.0
Actual value: 2

------TEST FAILED------
Testing Subjectivity for message: I understand your perspective and agree that I would not want to have resentment in the workplace against women, as that would further compound the issue we are looking at. I do think that it is true that women are underrepresented in STEM careers and am a believer that something should be done to address this discrepancy, even if that is not implementing a priority for women in hiring decisions. While I don\'t think that companies should explicitly hire simply because of their gender, I do think that they should be mindful of the gender gap in STEM and look to address those issues through their hiring practices.
Expected value: 3.0
Actual value: 2

------TEST FAILED------
Testing Disagreement for message: I understand your perspective and agree that I would not want to have resentment in the workplace against women, as that would further compound the issue we are looking at. I do think that it is true that women are underrepresented in STEM careers and am a believer that something should be done to address this discrepancy, even if that is not implementing a priority for women in hiring decisions. While I don\'t think that companies should explicitly hire simply because of their gender, I do think that they should be mindful of the gender gap in STEM and look to address those issues through their hiring practices.
Expected value: 1.0
Actual value: 0

As you can see, the correct values are 3, 3, and 1, but we get 2, 2, and 0.

Weirdly, when I call it on message (the preprocessed version), the test cases do not fail...

Right now, in calculate_chat_level_features, I'm calling it on message_lower_with_punc (which removes capitalization but retains punctuation):

def calculate_politeness_v2(self) -> None:
        """
        This function calculates politeness features from the SECR module
        """
        self.chat_data = pd.concat([self.chat_data, get_politeness_v2(self.chat_data, 'message_lower_with_punc')], axis=1) 

But yeah, I can't get the test cases to work and I find the inconsistency super weird. Are you able to get to the bottom of this?