codeforamerica / brigade-information

Data about Code for America brigades and other civic tech organizations for the CfA API
MIT License
17 stars 63 forks source link

[hack for la] add projects_tag and remove projects_list_url #109

Closed themightychris closed 4 years ago

themightychris commented 4 years ago

Instead of #106

tdooner commented 4 years ago

Looks good to me. What are the semantics of both of these keys being included?

Depending on the semantics, is it possible to keep both for H4LA for backwards compatibility? (e.g. maybe the new scraper could use the new projects_tag as a preference if it is provided?)

Also, I should have mentioned this in the other PR, but it'd be great to add the field to the JSONSchema for the file: https://github.com/codeforamerica/brigade-information/blob/master/schema.json

themightychris commented 4 years ago

So the crawler logic right now is that:

If projects_list_url is provided, that's the brigade's authoritative projects list. In that case, projects_tag may also be set and will be passed through to the organization profile with any other fields.

If no projects_list_url is provided, but projects_tag is provided, then tag search is fallen back to as the source for a projects list

There certainly may and should be brigades with both set. The crawler will not use tag search to discover projects when both are available, considering an explicit list as an effort to curate what's indexed

tdooner commented 4 years ago

I'm still a bit confused. Does the projects_tag allow other github organizations to associate projects with the brigade, or is it a filter of the full list of projects returned from the projects_list_url?

In the former case, there are two different sets of projects and it's up to the client to decide how to reconcile them for display. In the latter, the semantics are a lot cleaner IMHO.

I only ask because I think we've talked about this functionality for two different use-cases: 1) Allowing projects in different github orgs to associate with the brigade, and 2) Allowing brigade leaders to filter their projects list so that not all repos are considered to be active projects. Does this do one or both of these?

themightychris commented 4 years ago

Does the projects_tag allow other github organizations to associate projects with the brigade

Yes, projects_tag is a brigade saying "all projects affiliated with us should use this tag on GitHub to self-identify as such"

For the index, it will be how we find project repos in the absence of a curated projects list. When there is a curated projects list published, that is 100% of the brigade's projects list. We haven't figured out yet what happens when brigades stop liking all the projects using their tag coming up in their projects list. I expect that if we want to maintain this crowd-sourcing alternative to maintaining a whitelist of brigade projects, we'll eventually need some way for brigade captains to maintain a blacklist of projects using their tag that the index should ignore. I don't think it would make sense for that to live here in the brigade-information repo, so it might need to be somewhere in the index's repo.

There's only one brigade trying to manage their index listings this way so far, but there's also a marketing/SEO concern that it's good practice to push brigades to tag-up. The index team concluded it would be an important next step, independent of how the index's project discovery techniques evolve, to start trying to get to 100% as soon as possible of having an official tag documented for every brigade somewhere. Adding this field here seems to be the best way to give that data point an official place to live where future brigade captains can maintain it.

There being a projects_tag declared won't impact the indexer at all when there is a published projects list. For an example of how they might intersect though: I could imagine someone having a bot go through all the projects in the index and open an issue on all the GitHub-hosted ones that don't have their brigade's tag suggesting they add it.

tdooner commented 4 years ago

Sounds good, thanks for the reply.