codefordenver / partner-finder

Using an open dataset with registered colorado business to build a tool that manages outreach to potential CFD partners.
3 stars 14 forks source link

Clean socrata data #3

Open galbwe opened 3 years ago

galbwe commented 3 years ago
Nova791 commented 3 years ago

Clean Socrata Data

kaleeaswari commented 2 years ago

@galbwe Can I get this assigned?

galbwe commented 2 years ago

Hey @kaleeaswari if you can focus on the Colorado Non-profits (CNP) data, that would be the most helpful. We currently aren't using the socrata dataset because it was missing most of the relevant information for the app.

The files you will want to look at are scrape_CNP.py and tasks.py. A good start would be to filter out the records that are missing all of the fields needed for the the table on the homepage. There are other things to do like making sure strings are formatted consistently, and removing records for nonprofits with obvious religious affiliations.

kaleeaswari commented 2 years ago

Mandatory fields to consider a lead : Name, Contact, Website, SocialMedia. Is that correct?

@galbwe

galbwe commented 2 years ago

I think they all have the Name field. If a lead is missing Contact, Website, Facebook, Twitter, Instagram, and LinkedIn, then it should be dropped.