AMP-SCZ / utility

Storehouse for all utility scripts
Apache License 2.0
0 stars 4 forks source link

We need to write a program to retry REDCap disconnected files/subjects #127

Open tashrifbillah opened 1 month ago

tashrifbillah commented 1 month ago
urllib3.exceptions.MaxRetryError ``` CP00102_daily_activity_and_saliva_sample_collection.csv Traceback (most recent call last): File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/util/connection.py", line 72, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "/data/predict1/miniconda3/lib/python3.10/socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request self._validate_conn(conn) File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn conn.connect() File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connection.py", line 358, in connect self.sock = conn = self._new_conn() File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: : Failed to establish a new connection: [Errno -2] Name or service not known During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/adapters.py", line 489, in send resp = conn.urlopen( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen retries = retries.increment( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='redcap.partners.org', port=443): Max retries exceeded with url: /redcap/api/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/data/predict1/utility/rpms_to_redcap.py", line 387, in r = requests.post('https://redcap.partners.org/redcap/api/', data= fields) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/api.py", line 115, in post return request("post", url, data=data, json=json, **kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/adapters.py", line 565, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='redcap.partners.org', port=443): Max retries exceeded with url: /redcap/api/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')) ```
urllib3.exceptions.ProtocolError ``` CP00102_missing_data.csv Traceback (most recent call last): File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request httplib_response = conn.getresponse() File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 1374, in getresponse response.begin() File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 318, in begin version, status, reason = self._read_status() File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 287, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/adapters.py", line 489, in send resp = conn.urlopen( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen retries = retries.increment( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request httplib_response = conn.getresponse() File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 1374, in getresponse response.begin() File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 318, in begin version, status, reason = self._read_status() File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 287, in _read_status raise RemoteDisconnected("Remote end closed connection without" urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/data/predict1/utility/rpms_to_redcap.py", line 387, in r = requests.post('https://redcap.partners.org/redcap/api/', data= fields) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/api.py", line 115, in post return request("post", url, data=data, json=json, **kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/adapters.py", line 547, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) ```
urllib3.exceptions.NewConnectionError ``` CP00102_family_interview_for_genetic_studies_figs.csv.flat Traceback (most recent call last): File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/util/connection.py", line 72, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "/data/predict1/miniconda3/lib/python3.10/socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request self._validate_conn(conn) File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn conn.connect() File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connection.py", line 358, in connect self.sock = conn = self._new_conn() File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: : Failed to establish a new connection: [Errno -2] Name or service not known During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/adapters.py", line 489, in send resp = conn.urlopen( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen retries = retries.increment( File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='redcap.partners.org', port=443): Max retries exceeded with url: /redcap/api/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/data/predict1/utility/rpms_to_redcap.py", line 387, in r = requests.post('https://redcap.partners.org/redcap/api/', data= fields) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/api.py", line 115, in post return request("post", url, data=data, json=json, **kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/adapters.py", line 565, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='redcap.partners.org', port=443): Max retries exceeded with url: /redcap/api/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')) ```

The easiest solution may be to:

tashrifbillah commented 1 month ago

grep -B34 urllib3.exceptions.MaxRetryError /tmp/errors.txt | grep .csv

This gives only csv file names.

tashrifbillah commented 1 month ago

cd /data/predict1/utility/bsub/ ../parse_redcap_error.py "*err" | grep -B34 urllib3.exceptions.MaxRetryError | grep .csv

This gives only csv file names.

tashrifbillah commented 1 month ago

Put this within rpms_to_redcap.sh:

FORCE=1
for form in $(cat ~/failed.csv)
do
  pushd . > /dev/null
  subject=${form:0:7}
  site=${form:0:2}
  cd /data/predict1/data_from_nda/Prescient/PHOENIX/PROTECTED/Prescient${site}/raw/${subject}/surveys/
  echo $form
  /data/predict1/utility/rpms_to_redcap.py $form $redcap_dict $API_TOKEN $FORCE
  popd > /dev/null
  sleep 10
done
tashrifbillah commented 1 month ago

Another idea is catch this error within request and set its hash to zero.

tashrifbillah commented 1 month ago

Special characters are making it hard to streamline:

^[[0;31m ME01326_speech_sampling_run_sheet.csv ^[[0m

tashrifbillah commented 3 weeks ago

As an improvement, we removed special characters around $form: https://github.com/AMP-SCZ/utility/blob/4f55fb8284b626989c8390082d074bb1e186271a/rpms_to_redcap.lsf#L46

tashrifbillah commented 3 weeks ago

We shall actually have to run the whole pipeline for those selected cases. So two ideas:

(i)

  1. set upload=1 for those subjects in date_shift database
  2. and set upload=1 for those forms in {subject}_hashes.csv

(ii) or, write an rpms_records.txt and rerun the whole RPMS pipeline for upload, clean, down shift


(ii) seems like a big task. So we should just follow (i) and wait for the next run.

tashrifbillah commented 3 weeks ago

We decided to simply retry upload after 180 seconds interval.

Reference: https://stackoverflow.com/questions/15431044/can-i-set-max-retries-for-requests-request https://requests.readthedocs.io/en/latest/user/advanced/#transport-adapters https://requests.readthedocs.io/en/latest/api/ https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#module-urllib3.util.retry

tashrifbillah commented 3 weeks ago

If this scheme is successful, we should deploy this on import_records_all.py too.