Watts-Lab / team_comm_tools

An open-source Python library that turns multiparty conversational data into social-science backed features.
https://teamcommtools.seas.upenn.edu/
MIT License
3 stars 4 forks source link

Place all lexical features as a lexicon in the folder (so that the process can be parallelized) #96

Closed xehu closed 1 year ago

xehu commented 1 year ago

Currently, several of our features that use lexicons defines the lexicons inline.

For example, hedge:

hedge_words = ["sort of", "kind of", "I guess", "I think", "a little", "maybe", "possibly", "probably"]

First person pronouns:

first_pronouns = ["i",'me','mine','myself','my','we','our','ours','ourselves','lets']

For stylistic purposes, it will be better (and cleaner) to generate a file for these in the lexicons/ directory, and read it in each time. This way, we do not have to edit each python file in case the lexicon changes.


Files that will need to be changed include:

xehu commented 1 year ago

Also, as we update each lexicon in the list, make sure to update https://github.com/Watts-Lab/team-process-map/blob/main/feature_engine/features/lexicons/readme_lexicons.md as well! This way we can keep track of the description for each one.