genophenoenvo / terraref-datasets

Repository for code and small datasets derived from the TERRA REF program
MIT License
0 stars 3 forks source link

Convert one python notebook to script #109

Closed MagicMilly closed 3 years ago

MagicMilly commented 4 years ago

Starting with weather_data_cleaning.ipynb, we'll be converting that to a .py script using jupyter nbconvert or within PyCharm as a first step for improving and automating workflow.

MagicMilly commented 4 years ago

Weather data cleaning script now running for me with correct output csv files

KristinaRiemer commented 4 years ago

Okay, so I was just going through running this script. All of the lines that read in the csvs from url (e.g., line 30, 89, etc.) return this error:

urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1051)>

I was able to overcome this using this verify=False approach, but I imagine there's a good security-related reason to not do it this way?

MagicMilly commented 4 years ago

Thank you for trying it. I created public links for all of the weather data files on CyVerse, but there must still be some permission issues. I will look into it.

KristinaRiemer commented 4 years ago

I'm able to download the files directly by putting them into a browser, so I don't know if it's permissions issues?

MagicMilly commented 3 years ago

@KristinaRiemer could you try this test script and see if you get the same error? You may have to pip install wget

KristinaRiemer commented 3 years ago

I got the same SSL certificate error after running the wget.download line. This is the entire error + traceback, if that's useful at all?

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1392, in connect
    server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 412, in wrap_socket
    session=session
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 853, in _create
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1117, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1051)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<input>", line 6, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/wget.py", line 526, in download
    (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1051)>
MagicMilly commented 3 years ago

One more script for you to try, @KristinaRiemer. Thank you for your help!

KristinaRiemer commented 3 years ago

That worked with one minor file path change! Line 11 of that script needed to be df = pd.read_csv('data/clemson_rh_2014.csv'). I think the data folder in the main repo and the subfolder data in the src folder got mixed up in this.

You should be able to modify weather_data_cleaning.py with this different upload approach now, and actually this code should be able to go into a function that is run on the four URLs.

MagicMilly commented 3 years ago

Follow-up ticket to complete weather cleaning script #113