Closed kumarnik1 closed 2 months ago
@kumarnik1 thank you for the changes! The code now runs, but it looks like some of the assertions from the test_politeness document are failing:
test_feature_metrics.py:27: AssertionError
================================================= warnings summary =================================================
../../../../../anaconda3/envs/tpm_virtualenv/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1641
test_feature_metrics.py::test_conv_unit_equality[1-conversation_rows0]
test_feature_metrics.py::test_conv_unit_equality[2-conversation_rows1]
test_feature_metrics.py::test_conv_unit_equality[3-conversation_rows2]
test_feature_metrics.py::test_conv_unit_equality[4-conversation_rows3]
test_feature_metrics.py::test_conv_unit_equality[5-conversation_rows4]
test_feature_metrics.py::test_conv_unit_equality[6-conversation_rows5]
/Users/xehu/anaconda3/envs/tpm_virtualenv/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1641: DeprecationWarning: np.find_common_type is deprecated. Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information. (Deprecated NumPy 1.25)
return np.find_common_type(types, [])
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================= short test summary info ==============================================
FAILED test_feature_metrics.py::test_chat_unit_equality[row5] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row6] - assert 0 == 2.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row10] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row11] - assert 0 == 2.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row12] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row13] - KeyError: 'Acknowledgment'
FAILED test_feature_metrics.py::test_chat_unit_equality[row14] - KeyError: 'Acknowledgment'
FAILED test_feature_metrics.py::test_chat_unit_equality[row18] - KeyError: 'indirect_(greeting)'
FAILED test_feature_metrics.py::test_chat_unit_equality[row20] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row53] - KeyError: 'Factuality'
FAILED test_feature_metrics.py::test_chat_unit_equality[row54] - KeyError: 'Direct_question'
FAILED test_feature_metrics.py::test_chat_unit_equality[row55] - KeyError: 'Hasnegative'
FAILED test_feature_metrics.py::test_chat_unit_equality[row56] - KeyError: 'Hasnegative'
FAILED test_feature_metrics.py::test_chat_unit_equality[row84] - KeyError: 'Haspositive'
FAILED test_feature_metrics.py::test_chat_unit_equality[row85] - KeyError: 'Haspositive'
FAILED test_feature_metrics.py::test_chat_unit_equality[row86] - KeyError: 'Subjunctive'
FAILED test_feature_metrics.py::test_chat_unit_equality[row87] - KeyError: 'Apologizing'
FAILED test_feature_metrics.py::test_chat_unit_equality[row90] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row91] - KeyError: 'Please_start'
FAILED test_feature_metrics.py::test_chat_unit_equality[row92] - KeyError: 'Hashedge'
FAILED test_feature_metrics.py::test_chat_unit_equality[row93] - KeyError: 'Hasnegative'
FAILED test_feature_metrics.py::test_chat_unit_equality[row94] - KeyError: 'Haspositive'
FAILED test_feature_metrics.py::test_chat_unit_equality[row95] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row99] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row101] - KeyError: 'Apologizing'
FAILED test_feature_metrics.py::test_chat_unit_equality[row103] - KeyError: 'Direct_question'
FAILED test_feature_metrics.py::test_chat_unit_equality[row104] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row105] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row107] - assert 0 == 1.0
FAILED test_feature_metrics.py::test_chat_unit_equality[row108] - assert 0 == 2.0
==================================== 30 failed, 85 passed, 7 warnings in 1.48s =====================================
Would it be possible to check on some of these test cases? I added your tests into testing/data/cleaned_data/test_chat_level.csv
.
@kumarnik1 see Slack comment:
I am using the test string in https://github.com/bbevis/SECR to confirm the outputs of politeness V2:
I understand your perspective and agree that I would not want to have resentment in the workplace against women, as that would further compound the issue we are looking at. I do think that it is true that women are underrepresented in STEM careers and am a believer that something should be done to address this discrepancy, even if that is not implementing a priority for women in hiring decisions. While I don\'t think that companies should explicitly hire simply because of their gender, I do think that they should be mindful of the gender gap in STEM and look to address those issues through their hiring practices.
Here is the output from SECR:
(SECR) xehu@WHA-ODD44VVQ-ML System % python3 feature_extraction.py
Features Counts
0 Impersonal_Pronoun 12
1 First_Person_Single 5
2 Hedges 3
3 Negation 3
4 Subjectivity 3
5 Negative_Emotion 3
9 Reasoning 1
11 Agreement 1
10 Second_Person 1
37 Adverb_Limiter 1
8 Disagreement 1
6 Acknowledgement 1
7 First_Person_Plural 1
25 For_Me 0
36 WH_Questions 0
35 YesNo_Questions 0
34 Bare_Command 0
33 Truth_Intensifier 0
32 Apology 0
31 Ask_Agency 0
30 By_The_Way 0
29 Can_You 0
28 Conjunction_Start 0
27 Could_You 0
26 Filler_Pause 0
24 For_You 0
23 Formal_Title 0
22 Give_Agency 0
21 Affirmation 0
20 Gratitude 0
18 Hello 0
17 Informal_Title 0
16 Let_Me_Know 0
15 Swearing 0
14 Reassurance 0
13 Please 0
12 Positive_Emotion 0
19 Goodbye 0
38 Token_count 115
After testing the code on the Politeness V2 branch, I found 2 issues (one of which I was able to fix):
Here's the part I wasn't able to fix: despite running the code on message_original
(NO preprocessing), I am not able to reproduce the outputs of SECR.
Here's the 3 test cases that are failing:
------TEST FAILED------
Testing Negation for message: I understand your perspective and agree that I would not want to have resentment in the workplace against women, as that would further compound the issue we are looking at. I do think that it is true that women are underrepresented in STEM careers and am a believer that something should be done to address this discrepancy, even if that is not implementing a priority for women in hiring decisions. While I don\'t think that companies should explicitly hire simply because of their gender, I do think that they should be mindful of the gender gap in STEM and look to address those issues through their hiring practices.
Expected value: 3.0
Actual value: 2
------TEST FAILED------
Testing Subjectivity for message: I understand your perspective and agree that I would not want to have resentment in the workplace against women, as that would further compound the issue we are looking at. I do think that it is true that women are underrepresented in STEM careers and am a believer that something should be done to address this discrepancy, even if that is not implementing a priority for women in hiring decisions. While I don\'t think that companies should explicitly hire simply because of their gender, I do think that they should be mindful of the gender gap in STEM and look to address those issues through their hiring practices.
Expected value: 3.0
Actual value: 2
------TEST FAILED------
Testing Disagreement for message: I understand your perspective and agree that I would not want to have resentment in the workplace against women, as that would further compound the issue we are looking at. I do think that it is true that women are underrepresented in STEM careers and am a believer that something should be done to address this discrepancy, even if that is not implementing a priority for women in hiring decisions. While I don\'t think that companies should explicitly hire simply because of their gender, I do think that they should be mindful of the gender gap in STEM and look to address those issues through their hiring practices.
Expected value: 1.0
Actual value: 0
As you can see, the correct values are 3, 3, and 1, but we get 2, 2, and 0.
Weirdly, when I call it on message
(the preprocessed version), the test cases do not fail...
Right now, in calculate_chat_level_features, I'm calling it on message_lower_with_punc
(which removes capitalization but retains punctuation):
def calculate_politeness_v2(self) -> None:
"""
This function calculates politeness features from the SECR module
"""
self.chat_data = pd.concat([self.chat_data, get_politeness_v2(self.chat_data, 'message_lower_with_punc')], axis=1)
But yeah, I can't get the test cases to work and I find the inconsistency super weird. Are you able to get to the bottom of this?
Pull Request Template: If you are merging in a feature or other major change, use this template to check your pull request!
Basic Info
What's this pull request about?
3 added files:
Added to calculate_chat_level_features as well
Feature Documentation
Did you document your feature? Make sure you do the following before you pull request!
Code Basics
my_feature
, NOTmyFeature
(camel case).NAME_features.py
, where NAME is the name of my feature.feature_engine/features
.Testing
The location of my tests are here:
If you check all the boxes above, then you ready to merge!