Data4Democracy / indivisible

Aggregating call to action sites into a single application.
25 stars 19 forks source link

Investigate open data sets #14

Open pghosh opened 7 years ago

pghosh commented 7 years ago

This task is to investigate different open data sets to find models /train models that can identify action details from given text

restrellado commented 7 years ago

Hello! Are there particular datasets we should be starting with? I'd like to help if I can.

pghosh commented 7 years ago

tl;dr: No, this task is to see if there is an open data set/model exists Details, We have a few (like 100/200) emails that we need to start working with. Clearly that is not enough data to train any model. The goal for this task is to see if there are trained models available or find dataset that can be helpful to train models which can identify event details from a paragraph. end goal is inline with what an intelligent calendar does when you want to add a meeting/reminder/event details from a simple english sentence. I have a separate task to experiment with spacy on the emails we have . This task is for finding a open model or data set that can be leveraged to solve the problem. You can alos ping me in slack @pg if there is more questions!

davidmudrauskas commented 6 years ago

Have you considered using the Enron email dataset? https://www.cs.cmu.edu/~./enron/

Taking a look at it, I'm finding snippets such as:

PS: Colleen is setting up a meeting tomorrow to discuss the direction for transport. Hopefully we'll know much better where that part stands at that