Closed Tims777 closed 9 months ago
The issue with regionalatlas seems to be related to the hashtables, as the error only occurs when running the first time (without any hashtables)
The address scraping step was a very early experiment and is not "production ready". It was never meant to end up in the final pipeline, as we get the address from google. I'll creat a special demo pipeline config for the BDC.
The GPT Errors look worse than they are. It just means that the data was not present in the cache files. I adjusted the error logging.
The GPT Errors look worse than they are. It just means that the data was not present in the cache files. I adjusted the error logging.
Good to know. However, the error message is still appearing:
Running sentiment analysis on reviews: 78%|████████████████████████████████████████████████████████████ | 928/1190
[04:11<02:55, 1.49it/s]2024-02-05 22:20:13,197 | ERROR | s3_repository.py:212 | Error loading review from S3 with id ChIJFzDNMdC_xkcRmSfqst9o1g8.
Error: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
The GPT Errors look worse than they are. It just means that the data was not present in the cache files. I adjusted the error logging.
Good to know. However, the error message is still appearing:
Running sentiment analysis on reviews: 78%|████████████████████████████████████████████████████████████ | 928/1190 [04:11<02:55, 1.49it/s]2024-02-05 22:20:13,197 | ERROR | s3_repository.py:212 | Error loading review from S3 with id ChIJFzDNMdC_xkcRmSfqst9o1g8. Error: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
This seems to be S3 specific, I'll check again.
The GPT Errors look worse than they are. It just means that the data was not present in the cache files. I adjusted the error logging.
Good to know. However, the error message is still appearing:
Running sentiment analysis on reviews: 78%|████████████████████████████████████████████████████████████ | 928/1190 [04:11<02:55, 1.49it/s]2024-02-05 22:20:13,197 | ERROR | s3_repository.py:212 | Error loading review from S3 with id ChIJFzDNMdC_xkcRmSfqst9o1g8. Error: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
This seems to be S3 specific, I'll check again.
It's just ungraceful error handling, when a google place has no reviews, we don't save any to S3. The sentiment analyzer just assumes where the review file should be but cant find it. The sentiment score will be None
in that case.
When running the pipeline
run_all_steps.json
(but withforceRefresh
set tofalse
everywhere), several errors happen in the different steps. These need to be fixed or the affected pipeline steps should be taken out.List of errors
Ordered by severity
| ERROR | pipeline.py:57 | Step Regional_Atlas failed! Columns must be same length as key
| ERROR | s3_repository.py:212 | Error loading review from S3 with id ChIJkdTnnsMzs1IRlCF2m6bKYsU. Error: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
(might indicate a problem with the Google step)Getting addresses from custom domains...: 47%|████████████████████████▏ | 241/518 [17:00<19:32, 4.23s/it]
Note
The current pipeline
run_all_steps.json
should be changed to haveforceRefresh: false
set everywhere. The current configuration can optionally be copied to a new pipeline configforce_refresh_all_steps.json
.Acceptance Criteria
run_all_steps.json
with forceRefresh set tofalse
everywhere