Description:
The dr10574-ho-avh-weekly job has been consistently failing over the last few weeks. Upon further investigation, the failure was traced to an issue in the fetcher. The following error was noted:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 1786: invalid start byte
Steps to Reproduce:
Run the dr10574-ho-avh-weekly job.
The job fails with the UnicodeDecodeError.
Testing:Local Execution: The code was executed locally after establishing a connection with S3. It worked perfectly in the local environment and also ran without issues on Databox.
Data Examination: Upon thorough investigation of the data, an issue was observed with certain characters. The problematic data involves latitude and longitude values, specifically:
Latitude and longitude automatically calculated from verbatim grid reference. Unadjusted latitude: 42�35'S. Unadjusted longitude: 147�52'E.,-42.583333,147.866119,EPSG:4202,50,0.000278,AMG 55 571100 5287900
Suggested Solution:
A strategy is required to handle the specific encoded data in Preingestion. Potential solutions include:
Identifying and handling non-UTF-8 encoded characters.
Implementing error handling strategies such as errors='ignore' or errors='replace' in the file reading process.
Description: The dr10574-ho-avh-weekly job has been consistently failing over the last few weeks. Upon further investigation, the failure was traced to an issue in the fetcher. The following error was noted: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 1786: invalid start byte
Steps to Reproduce: Run the dr10574-ho-avh-weekly job. The job fails with the UnicodeDecodeError. Testing: Local Execution: The code was executed locally after establishing a connection with S3. It worked perfectly in the local environment and also ran without issues on Databox.
Data Examination: Upon thorough investigation of the data, an issue was observed with certain characters. The problematic data involves latitude and longitude values, specifically: Latitude and longitude automatically calculated from verbatim grid reference. Unadjusted latitude: 42�35'S. Unadjusted longitude: 147�52'E.,-42.583333,147.866119,EPSG:4202,50,0.000278,AMG 55 571100 5287900
Suggested Solution: A strategy is required to handle the specific encoded data in Preingestion. Potential solutions include: