datamade / nyc-council-councilmatic

NYC Council version of Councilmatic
MIT License
7 stars 3 forks source link

First search result for taxi (which we encourage the user to try out) is bill that has been deleted rom legistar #87

Closed fgregg closed 5 years ago

fgregg commented 6 years ago

https://nyc-council-staging.datamade.us/search/?q=taxi&page=1

https://nyc-council-staging.datamade.us/legislation/t-2017-6878/

reginafcompton commented 6 years ago

Weird.

So, this bill does not exist on the Legistar UI, but it does exist in the web API: https://webapi.legistar.com/v1/nyc/matters/58413?token=....

Here's the OCD API for reference.

@hancush - do we want to scrape data not available in the Legistar user interface?

hancush commented 6 years ago

I'd like to know why it's not in Legistar. (I searched for the identifier and didn't come up with anything.)

But in the meantime, it looks like this is another instance of Legistar returning a 200 when it shouldn't.

In [1]: import requests

In [2]: r = requests.get('http://legistar.council.nyc.gov/LegislationDetail.aspx?ID=3289669&GUID=718D3F80-59AB-4D69-B3B0-C832B0A506E8')

In [3]: r.status_code
Out[3]: 200

In [4]: r.text
Out[4]: 'Invalid parameters!'

I think we should add a condition to _check_errors in python-legistar that raises a ScrapeError when response.text is "Invalid parameters!" so we can skip these bills.

hancush commented 6 years ago

It looks like a version of that bill does exist in Legistar – do we have this version in the OCD API?

hancush commented 6 years ago

So, this actually seems like a case of a duplicate bill. There is updated version of this bill in Legistar, the OCD API, and Councilmatic.

The bill was inserted again rather than updated because the identifier is slightly different – "T 2017-6878" vs. "Res 1762-2017". Unfortunately, we don't have a mechanism for deleting old information when this happens.

Seems like we want to avoid situations like this. Should we be checking bills for the same API source URL, perhaps? (The matter ID is consistent across versions here.) Alternatively, or additionally, perhaps we should check Legistar source URLs to see if they're active?

reginafcompton commented 6 years ago

Related to: https://github.com/opencivicdata/pupa/issues/295

To close this issue, let's simply suggest another search query in the input bar....

reginafcompton commented 6 years ago

I removed the duplicate bill from the OCD API and Councilmatic database (i.e., the bill with id "ocd-bill/afef2cb7-2b8d-4ce9-916b-34725ffa47f4", which duplicated this bill).

jeancochrane commented 6 years ago

Since the missing bill has been removed from the database, I think this issue has been fixed -- is that right @reginafcompton?

reginafcompton commented 6 years ago

Not yet! We actually need to change this: https://github.com/datamade/nyc-council-councilmatic/blob/e6cf504a1ccef474affdb9432866014414baa5ad/councilmatic/settings_jurisdiction.py#L72

(The conversation above discloses that, in addition to the Taxi bill, the suggested resolution is not in Legistar or our databases: Resolution 815-2015)....Let's just suggest a bill that people can find "Introduction 2018-0327" - it will also be a nice test of the relevance search.