amosproj / amos2023ws06-sales-lead-qualifier

MIT License
4 stars 0 forks source link

Pipeline execution freezes while checking hash tables when using the 100k dataset #212

Closed luccalb closed 8 months ago

luccalb commented 8 months ago

While running the pipeline again to add the sentiment analysis for the full dataset, the execution freezed during the hash table procedure. After manually cancelling the process, the following traceback is produced, suggesting that it might be an issue with the number of requests to AWS:

Traceback (most recent call last):
  File "D:\MasterInf\AMOS\src\main.py", line 40, in <module>
    DEMOS[choice]()
  File "D:\MasterInf\AMOS\src\demo\demos.py", line 211, in pipeline_demo
    pipeline.run()
  File "D:\MasterInf\AMOS\src\bdc\pipeline.py", line 50, in run
    step_df = step.run()
  File "D:\MasterInf\AMOS\src\bdc\steps\analyze_emails.py", line 119, in run
    self.df[["domain", "email_valid"]] = self.df.apply(
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\pandas\core\frame.py", line 9423, in apply
    return op.apply().__finalize__(self, method="apply")
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\pandas\core\apply.py", line 678, in apply
    return self.apply_standard()
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\pandas\core\apply.py", line 798, in apply_standard
    results, res_index = self.apply_series_generator()
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\pandas\core\apply.py", line 814, in apply_series_generator
    results[i] = self.f(v)
  File "D:\MasterInf\AMOS\src\bdc\steps\analyze_emails.py", line 120, in <lambda>
    lambda lead: get_lead_hash_generator().hash_check(
  File "D:\MasterInf\AMOS\src\bdc\steps\helpers\generate_hash_leads.py", line 80, in hash_check
    get_database().save_lookup_table(lookup_table, step_name)
  File "D:\MasterInf\AMOS\src\database\leads\s3_repository.py", line 249, in save_lookup_table
    self._save_to_s3(csv_buffer.getvalue(), bucket, key)
  File "D:\MasterInf\AMOS\src\database\leads\s3_repository.py", line 134, in _save_to_s3
    s3.put_object(
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\client.py", line 553, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\client.py", line 989, in _make_api_call
    http, parsed_response = self._make_request(
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\client.py", line 1015, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\endpoint.py", line 119, in make_request
    return self._send_request(request_dict, operation_model)
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\endpoint.py", line 199, in _send_request
    success_response, exception = self._get_response(
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\endpoint.py", line 241, in _get_response
    success_response, exception = self._do_get_response(
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\endpoint.py", line 281, in _do_get_response
    http_response = self._send(request)
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\endpoint.py", line 377, in _send
    return self.http_session.send(request)
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\httpsession.py", line 464, in send
    urllib_response = conn.urlopen(
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\urllib3\connectionpool.py", line 791, in urlopen
    response = self._make_request(
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\urllib3\connectionpool.py", line 497, in _make_request
    conn.request(
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\awsrequest.py", line 96, in request
    rval = super().request(method, url, body, headers, *args, **kwargs)
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\urllib3\connection.py", line 409, in request
    self.send(chunk)
  File "C:\Users\dev\.virtualenvs\AMOS-6PSujsUl\lib\site-packages\botocore\awsrequest.py", line 223, in send
    return super().send(str)
  File "C:\Python310\lib\http\client.py", line 999, in send
    self.sock.sendall(data)
  File "C:\Python310\lib\ssl.py", line 1237, in sendall
    v = self.send(byte_view[count:])
  File "C:\Python310\lib\ssl.py", line 1206, in send
    return self._sslobj.write(data)
KeyboardInterrupt
luccalb commented 8 months ago

I'm not sure if this necessarily needs to be fixed for the demo day, as the demo dataset will likely be much smaller than 100k entries.

Tims777 commented 8 months ago

Closing this as not planned, since it will not impact our demo or the handover.