Open dphoria opened 2 years ago
Here is a very good document prepared by @AZIZXlaouiti . https://docs.google.com/document/d/1vFXOAFsGK5AOCbfcvt-LwFR71IUiTkubwa8EcBwgbHQ
cc @Shak2000
I have hit a road block. Can't quite to come up with a clean way to get events for a given period of time. So far to me, the main candidates for information sources are
I have yet to figure out how to make a query with a period time as parameter(s) into any of the above sources. 1 and 2 can be used to retrieve the most recent N meetings.
3, an API, seemed like a good choice going in, but I am not so warm to it now. It is a good resource to get information about specific bills, laws, etc. It, to me, is almost useless to get agendas, and so on about meetings.
What about this? https://dccouncil.us/events/2022-01/
You can fill in any year and month and then find the day in the calendar?
What about this? https://dccouncil.us/events/2022-01/
You can fill in any year and month and then find the day in the calendar?
Oh man why didn't I think of this approach! I even saw that calendar before LOL. Yes I think this is, at least to me, the best route I've seen thus far. :+1: Awesome.
Finally got around to making a first draft. Just getting the minimal now. https://gist.github.com/dphoria/7bea514b1a201f33ade2cf8c8d9fa707 Made a stand-alone file for now for easier development and testing.
import washington_dc
from datetime import datetime
washington_dc.get_events_on_date(datetime(2022, 2, 1))
[
EventIngestionModel(
body=Body(name='Committee of the Whole', is_active=True, start_datetime=None, description=None, end_datetime=None, external_source_id=None),
sessions=[
Session(
session_datetime=datetime.datetime(2022, 2, 1, 12, 0),
video_uri='http://archive-media.granicus.com:443/OnDemand/dc/dc_2bc5049c-4415-4cbe-a069-35623328a371.mp4',
session_index=0,
caption_uri='https://dc.granicus.com/TranscriptViewer.php?view_id=4&clip_id=7039',
external_source_id=None,
),
],
event_minutes_items=None,
agenda_uri='https://dccouncil.us/wp-content/uploads/2022/01/2.1.22-COW-Agenda_ADDITIONAL-1.pdf',
minutes_uri=None,
static_thumbnail_uri=None,
hover_thumbnail_uri=None,
external_source_id=None,
),
EventIngestionModel(
body=Body(name='City Council', is_active=True, start_datetime=None, description=None, end_datetime=None, external_source_id=None),
sessions=[
Session(
session_datetime=datetime.datetime(2022, 2, 1, 13, 0),
video_uri='http://archive-media.granicus.com:443/OnDemand/dc/dc_dc26ab8b-ac05-48cd-968e-94ba67282a87.mp4',
session_index=0,
caption_uri='https://dc.granicus.com/TranscriptViewer.php?view_id=3&clip_id=7040',
external_source_id=None,
),
],
event_minutes_items=None,
agenda_uri='https://dccouncil.us/wp-content/uploads/2021/12/February-1-2022-Legislative-Meeting-2.pdf',
minutes_uri=None,
static_thumbnail_uri=None,
hover_thumbnail_uri=None,
external_source_id=None,
),
]
Foremost question in my head is best way to get votes. I think https://lims.dccouncil.us/ https://lims.dccouncil.us/api/help/index.html using information parsed from an event page like https://dccouncil.us/event/legislative-meeting-86/
What is highly disappointing is that I thought DC used to have an event's minute items listed in the lower left table on their video player. That seems to be no longer the case?
e.g. On http://dc.granicus.com/ViewPublisher.php?view_id=3, click on any "Video" link on the right. The popup is largely empty with just the video. That used to have a lot of useful information we could have used to get EventMinutesItem
, etc.
@dphoria i did notice that along with the absence of pdf document and sometimes captions aren't available
Nice job!!
Can't comment on the PDF document but I wouldn't worry if the captions are optionally available. Seattle has captions for roughly 95% of meetings. If captions aren't available we will roll back to Google. No worries.
Excited to see this progress!!
Any luck in adding to the scraper, @AZIZXlaouiti ? I've been working on other issues recently; probably will be for another couple more weeks. After that I may be able to hop back on this if necessary. Anyway just wanted to check in.
@dphoria i had some busy weeks (family / interview) related so i wasn't active as i wanted to be but i will resume the work this week . My apologies.
@dphoria i managed to get the event_minutes added . i parsed the pdf from agenda_uri and managed to get all the legistlation_number after that i'll have to use lims api to get the votes/ votes status /persons.
https://gist.github.com/AZIZXlaouiti/b3b0ccab24a1fbd0586fb8756fc85c1c
[
EventIngestionModel(body=Body("name=""Committee of the Whole",
"is_active=True",
"start_datetime=None",
"description=None",
"end_datetime=None",
"external_source_id=None)",
"sessions="[
Session(session_datetime=datetime.datetime(2022, 2 , 1, 12 ,0),
"video_uri=""http://archive-media.granicus.com:443/OnDemand/dc/dc_2bc5049c-4415-4cbe-a069-35623328a371.mp4",
session_index=0,
"caption_uri=""https://dc.granicus.com/TranscriptViewer.php?view_id=4&clip_id=7039",
"external_source_id=None)"
],
"event_minutes_items="[
"EventMinutesItem(minutes_item=MinutesItem(name=""Bill 24-117",
"description=None",
"external_source_id=None)",
"index=None",
"matter=Matter(name=""B24-0117",
"matter_type=None",
"title=""Armstead Barnett Way Designation Act of 2021",
"result_status=None",
"sponsors=None",
"external_source_id=None)",
"supporting_files=None",
"decision=None",
"votes=None)",
],
"agenda_uri=""https://dccouncil.us/wp-content/uploads/2022/01/2.1.22-COW-Agenda_ADDITIONAL-1.pdf",
"minutes_uri=None",
"static_thumbnail_uri=None",
"hover_thumbnail_uri=None",
"external_source_id=None)",
"EventIngestionModel(body=Body(name=""City Council",
"is_active=True",
"start_datetime=None",
"description=None",
"end_datetime=None",
"external_source_id=None)",
"sessions="[
Session(session_datetime=datetime.datetime(2022, 2, 1, 13 ,0),
"video_uri=""http://archive-media.granicus.com:443/OnDemand/dc/dc_dc26ab8b-ac05-48cd-968e-94ba67282a87.mp4",
session_index=0,
"caption_uri=""https://dc.granicus.com/TranscriptViewer.php?view_id=3&clip_id=7040",
"external_source_id=None)"
],
"event_minutes_items="[
"EventMinutesItem(minutes_item=MinutesItem(name=""CER 24-125",
"description=None",
"external_source_id=None)",
"index=None",
"matter=Matter(name=""CER24-0125",
"matter_type=None",
"title=""Beverly Odoms-Johnson Posthumous Recognition Ceremonial Resolution of 2022",
"result_status=None",
"sponsors=None",
"external_source_id=None)",
"supporting_files=None",
"decision=None",
"votes=None)",
],
"agenda_uri=""https://dccouncil.us/wp-content/uploads/2021/12/February-1-2022-Legislative-Meeting-2.pdf",
"minutes_uri=None",
"static_thumbnail_uri=None",
"hover_thumbnail_uri=None",
"external_source_id=None)"
]
@dphoria i had some busy weeks (family / interview) related so i wasn't active as i wanted to be but i will resume the work this week . My apologies.
No absolutely no need for any apologies. :smile: I was just curious.
Feature Description
A clear and concise description of the feature you're requesting.
Provide a file in
cdp_scrapers/instances/
likecdp_scrapers/instances/dc.py
or something similar that provides a function that implements API to return Washington, DC city council meetings asList[EventIngestionModel]
for a period of time, e.g.Use Case
Please provide a use case to help us understand your request in context.
Above file and API would be used in deploying a CDP instance for Washington, DC.