AMP-SCZ / utility

Storehouse for all utility scripts
Apache License 2.0
0 stars 4 forks source link

Write a program to replace all non-English characters by space #114

Open tashrifbillah opened 3 months ago

tashrifbillah commented 3 months ago

This is the error message validation tool spits out:

Validating files...
  0%|          | 0/1 [00:00<?, ?it/s]Exception in thread Thread-2:
Traceback (most recent call last):
  File "/data/predict1/miniconda3/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/NDATools/Validation.py", line 486, in run
    response = post_request(self.api_scope, data, timeout=self.validation_timeout, headers = {'content-type':'text/csv'}, auth=self.auth)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/NDATools/Utils.py", line 281, in post_request
    return _send_prepared_request(req.prepare(), timeout=timeout, deserialize_handler=deserialize_handler, error_handler=error_handler)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/NDATools/Utils.py", line 244, in _retry
    tmp = func(*args, **kwargs)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/NDATools/Utils.py", line 267, in _send_prepared_request
    tmp = session.send(prepped, timeout=timeout)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 1327, in _send_request
    body = _encode(body, 'body')
  File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 166, in _encode
    raise UnicodeEncodeError(
UnicodeEncodeError: 'latin-1' codec can't encode character '\u02bc' in position 389181: Body ('ʼ') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

Files and their non-English characters are following:

    /data/predict1/to_nda/nda-submissions/tbi01_Prescient_screening.csv
    消炎藥

    /data/predict1/to_nda/nda-submissions/network_combined/socdem01.csv
    ʼ

    /data/predict1/to_nda/nda-submissions/ampscz_pps01_Pronet.csv
    Many lines, search: 경기도 and scroll down along the same column

    /data/predict1/to_nda/nda-submissions/vitas01_Prescient_baseline.csv
    科興