Open pghosh opened 7 years ago
I'm going to give this a try, anyone that wants to help is welcome.
Was able to successfully create a .py scraper using beautiful soup to scrape the call to actions of risestronger.org. However, there are only 10 items. Will give resistancenearme.org/ a shot today
@brucerowan Any update?
@crypdick Hi sorry for the delayed response, we've made some good progress. I would check out this repository https://github.com/brucerowan/indivisible/tree/scrap_websites/ingest/web_scraper
@crypdick If you are good at object oriented programming that would actually help me out a lot. Basically, if you could understand how to implement the base_scraper class to the resistancenear me.py file that would help me out a lot. message me @bruce_r on slack.
We will build training data for classifying actions in the following ways,
scrap websites and create csv with
manually screened data from the emails
Dataset will be saved in data.world and used for tagging actions identified in emails.
Websites to start with https://resistancenearme.org/ www.risestronger.org
Business value: This task is there to server as the first step for auto tagging action items . This is data collection. The goal is to create labeled dataset that can be used to train classifiers to auto tag actions. To start with we should use 'event type' from resistancenearme as the tag. The scrapping task should map event text with one to one map. For email we need to analyze the text to see if we can find pattern that makes the text/action fall into a category. idea is if we can identify pattern then we can write scripts to do the tagging. if not we should spin up task to manually go through emails and tag them. Some starting pointers are Check the email address , some organizations tend to organize certain kind of tasks See the verbs , that might actullay have something like rally These are just ideas, feel free to add what works and what does not work