Open CharlotteJackson opened 3 years ago
Can test on the tweets collected here https://github.com/CharlotteJackson/DC_Crash_Bot/blob/master/data/AlertDCio.json
Tested textblob, was not best but was able to get a simple sentiment score. and make a heatmap with the locations
Some thouhgts form meeting
Looked at using VADER nltk Get more data from other sources, alertdc io not exhaustive
Will twitter only have bad data? no one tweets good stuff?
Maybe look at adjectives how a place is described?
Using instatext on some text scrapping got some decent results...
notes:
python try_model.py "I was hit while on my bike at dupont circle"
{'traffic_incident': 0.8903042674064636, 'non_traffic_incident': 0.1097157895565033}
python try_model.py "I was almost hit by a car while riding my bike"
{'traffic_incident': 1.0000100135803223, 'non_traffic_incident': 1.0000003385357559e-05}
python try_model.py "she was hit while in her car at logan circle"
{'traffic_incident': 0.9883227348327637, 'non_traffic_incident': 0.01169725600630045}
python try_model.py "I was walking home then saw a rainbow"
{'non_traffic_incident': 0.59267657995224, 'traffic_incident': 0.3998216390609741}
Some examples from the dataset not labeled yet
python try_model.py "about an hour ago, when I was crossing K St. near the Connecticut Ave. side of the Farragut North station, Metrobus 7161 turned left in front of me as I was walking in the crosswalk with a walk sign. Please remind your bus drivers to yield to pedestrians"
{'non_traffic_incident': 0.935343861579895, 'traffic_incident': 0.06467615067958832}
python try_model.py "Who do I contact from @washingtondc or @councilofdc to install a better lighted crosswalk in my neighborhood? The intersection of Quincy St and New Hampshire Ave is dangerous, pedestrian safety is immensely needed before a deadly accident occurs."
{'non_traffic_incident': 0.9297100305557251, 'traffic_incident': 0.07030998915433884}
python try_model.py "Pedestrian dies after being hit by car in DC: - One person is dead and three others were injured after a crash in… http://t.co/MPfsLHViny,"
{'traffic_incident': 0.8606407642364502, 'non_traffic_incident': 0.13937927782535553}
What is the Task
See if we can use existing NLP libraries to calculate negative/positive sentiment for given intersections/blocks in DC
Why do we want to do this
as the first step to scraping twitter for other traffic related data
How can I get started?
How can we start this task?
Definition of Done
when we have an output dataset