CharlotteJackson / DC_Crash_Bot

10 stars 7 forks source link

Test NLP libraries on existing scraped tweets #49

Open CharlotteJackson opened 3 years ago

CharlotteJackson commented 3 years ago

What is the Task

See if we can use existing NLP libraries to calculate negative/positive sentiment for given intersections/blocks in DC

Why do we want to do this

as the first step to scraping twitter for other traffic related data

How can I get started?

How can we start this task?

Definition of Done

when we have an output dataset

banjtheman commented 3 years ago

Can test on the tweets collected here https://github.com/CharlotteJackson/DC_Crash_Bot/blob/master/data/AlertDCio.json

banjtheman commented 3 years ago

Tested textblob, was not best but was able to get a simple sentiment score. and make a heatmap with the locations

banjtheman commented 3 years ago

Some thouhgts form meeting

Looked at using VADER nltk Get more data from other sources, alertdc io not exhaustive

Will twitter only have bad data? no one tweets good stuff?

Maybe look at adjectives how a place is described?

banjtheman commented 3 years ago

Using instatext on some text scrapping got some decent results...

notes:

python try_model.py "I was hit while on my bike at dupont circle"
{'traffic_incident': 0.8903042674064636, 'non_traffic_incident': 0.1097157895565033}

python try_model.py "I was almost hit by a car while riding my bike"
{'traffic_incident': 1.0000100135803223, 'non_traffic_incident': 1.0000003385357559e-05}

python try_model.py "she was hit while in her car at logan circle"
{'traffic_incident': 0.9883227348327637, 'non_traffic_incident': 0.01169725600630045}

python try_model.py "I was walking home then saw a rainbow"
{'non_traffic_incident': 0.59267657995224, 'traffic_incident': 0.3998216390609741}
banjtheman commented 3 years ago

Some examples from the dataset not labeled yet

python try_model.py "about an hour ago, when I was crossing K St. near the Connecticut Ave. side of the Farragut North station, Metrobus 7161 turned left in front of me as I was walking in the crosswalk with a walk sign. Please remind your bus drivers to yield to pedestrians"
{'non_traffic_incident': 0.935343861579895, 'traffic_incident': 0.06467615067958832}

python try_model.py "Who do I contact from @washingtondc or @councilofdc to install a better lighted crosswalk in my neighborhood? The intersection of Quincy St and New Hampshire Ave is dangerous, pedestrian safety is immensely needed before a deadly accident occurs."
{'non_traffic_incident': 0.9297100305557251, 'traffic_incident': 0.07030998915433884}

python try_model.py "Pedestrian dies after being hit by car in DC: - One person is dead and three others were injured after a crash in… http://t.co/MPfsLHViny,"
{'traffic_incident': 0.8606407642364502, 'non_traffic_incident': 0.13937927782535553}