Describe the bug
A user may see an error like below when using the python package , sometimes due to limited network reachability. It is hard to tell which connection (host:port) did we error on. It could be dependencies download, remote filesystem etc.
2024-08-14 12:36:41,809 INFO spark_rapids_tools.argparser: ...applying argument case: Jar Argument
2024-08-14 12:36:41,809 INFO spark_rapids_tools.argparser: ...applying argument case: Jar Argument
2024-08-14 12:36:41,812 INFO rapids.tools.qualification: Using Spark RAPIDS user tools version 24.08.0
2024-08-14 12:36:41,812 INFO rapids.tools.qualification: ******* [Initialization]: Starting *******
2024-08-14 12:36:41,878 INFO rapids.tools.qualification.ctxt: Start connecting to the platform
2024-08-14 12:36:41,879 WARNING rapids.tools.cmd_driver: Environment report: Platform region is not set.
2024-08-14 12:36:41,880 INFO rapids.tools.qualification: ======= [Initialization]: Finished =======
2024-08-14 12:36:41,880 INFO rapids.tools.qualification: ******* [Connecting to Execution Cluster]: Starting *******
2024-08-14 12:36:41,880 INFO rapids.tools.qualification: Qualification requires no execution cluster. Skipping phase
2024-08-14 12:36:41,880 INFO rapids.tools.qualification: ======= [Connecting to Execution Cluster]: Finished =======
2024-08-14 12:36:41,880 INFO rapids.tools.qualification: ******* [Process-Arguments]: Starting *******
2024-08-14 12:36:41,880 DEBUG rapids.tools.qualification: Processing Output Arguments
2024-08-14 12:36:41,880 DEBUG rapids.tools.qualification: Root directory of local storage is set as: /home/lpidapar
2024-08-14 12:36:41,880 INFO rapids.tools.qualification.ctxt: Local workdir root folder is set as /home/myuser/qual_20240814123641_ab3321bA
2024-08-14 12:36:41,881 INFO rapids.tools.qualification.ctxt: Dependencies are generated locally in local disk as: /home/myuser/qual_20240814123641_ab3321bA/work_dir
2024-08-14 12:36:41,881 INFO rapids.tools.qualification.ctxt: Local output folder is set as: /home/myuser/qual_20240814123641_ab3321bA
2024-08-14 12:36:41,881 INFO rapids.tools.qualification: Qualification tool processing the arguments
2024-08-14 12:36:41,897 ERROR root: Qualification. Raised an error in phase [Process-Arguments]
Traceback (most recent call last):
File "/home/myuser/condapv/envs/spark_rapids/lib/python3.9/urllib/request.py", line 1342, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/home/myuser/condapv/envs/spark_rapids/lib/python3.9/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/myuser/condapv/envs/spark_rapids/lib/python3.9/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/myuser/condapv/envs/spark_rapids/lib/python3.9/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/myuser/condapv/envs/spark_rapids/lib/python3.9/http/client.py", line 1010, in _send_output
self.send(msg)
File "/home/myuser/condapv/envs/spark_rapids/lib/python3.9/http/client.py", line 950, in send
self.connect()
File "/home/myuser/condapv/envs/spark_rapids/lib/python3.9/http/client.py", line 1417, in connect
super().connect()
File "/home/myuser/condapv/envs/spark_rapids/lib/python3.9/http/client.py", line 921, in connect
self.sock = self._create_connection(
File "/home/myuser/condapv/envs/spark_rapids/lib/python3.9/socket.py", line 822, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/home/myuser/condapv/envs/spark_rapids/lib/python3.9/socket.py", line 953, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
Steps/Code to reproduce bug
Use python tools 24.8.0 on a node with no internet access.
Expected behavior
Give better error logging with host:port / service defined
Environment details (please complete the following information)
Hadoop cluster
@tgravescs #1292 logs the message when downloading the Tools JAR. As a follow up, we should probably log message while downloading of any resource or any http request.
Describe the bug A user may see an error like below when using the python package , sometimes due to limited network reachability. It is hard to tell which connection (host:port) did we error on. It could be dependencies download, remote filesystem etc.
Steps/Code to reproduce bug Use python tools 24.8.0 on a node with no internet access.
Expected behavior Give better error logging with host:port / service defined
Environment details (please complete the following information) Hadoop cluster