ShelterApp / AddResources

http://shelterapp.org/
11 stars 10 forks source link

Parsing through BC Food Banks, Washington DC Homeless Shelters #44

Closed Shak2000 closed 3 years ago

Shak2000 commented 3 years ago

I parsed through the data set on food banks in British Columbia and the data set on homeless shelters in Washington DC. I also fixed a minor bug in utils.py that would occur when the collection only contained duplicates (see line 27 of utils.py).

ShelterApp commented 3 years ago

When I tried to run the Washington DC script it failed. Have you tried running it?

Well it's still giving me problem, it runs for 5 mins and then fails for following error bson.errors.InvalidDocument: cannot encode object: 20032, of type: <class 'numpy.int64'> . Below is the stack trace

Traceback (most recent call last):
  File "c:/Users/madha/IdeaProjects/AddResources/AddResources/Washington_DC_Shelters/Washington_DC_Shelters.py", line 70, in <module>
    dc_shelters_scraper.main_scraper(client)
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\shared_code\base_scraper.py", line 144, in main_scraper
    dc = locate_potential_duplicate(
  **File "c:\Users\madha\IdeaProjects\AddResources\AddResources\shared_code\utils.py", line 126, in locate_potential_duplicate
    dupe_candidate = coll.find_one(**
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\.venv\lib\site-packages\pymongo\collection.py", line 1319, in find_one
    for result in cursor.limit(-1):
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\.venv\lib\site-packages\pymongo\cursor.py", line 1207, in next
    if len(self.__data) or self._refresh():
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\.venv\lib\site-packages\pymongo\cursor.py", line 1124, in _refresh
    self.__send_message(q)
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\.venv\lib\site-packages\pymongo\cursor.py", line 999, in __send_message
    response = client._run_operation_with_response(
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\.venv\lib\site-packages\pymongo\mongo_client.py", line 1368, in _run_operation_with_response  
    return self._retryable_read(
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\.venv\lib\site-packages\pymongo\mongo_client.py", line 1471, in _retryable_read
    return func(session, server, sock_info, slave_ok)
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\.venv\lib\site-packages\pymongo\mongo_client.py", line 1360, in _cmd
    return server.run_operation_with_response(
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\.venv\lib\site-packages\pymongo\server.py", line 101, in run_operation_with_response
    message = operation.get_message(
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\.venv\lib\site-packages\pymongo\message.py", line 343, in get_message
    request_id, msg, size, _ = _op_msg(
  File "c:\Users\madha\IdeaProjects\AddResources\AddResources\.venv\lib\site-packages\pymongo\message.py", line 714, in _op_msg
    return _op_msg_uncompressed(
bson.errors.InvalidDocument: cannot encode object: 20032, of type: <class 'numpy.int64'>

You are not seeing it because you are running it on fresh shelter database which doesn't have services collection against which scraper will try to do the partial match. You can copy services collection in your mongoDB database and try running the script.