LinkedEarth / pylipd

Development repository for Python LiPD utilities
https://pylipd.readthedocs.io/en/latest/
Apache License 2.0
2 stars 0 forks source link

LiPD.load() should be able to load from a URL #8

Closed khider closed 1 year ago

khider commented 1 year ago

Right now if trying to load from a URL, returns the following error:

d.load('https://lipdverse.org/data/LCf20b99dfe8d78840ca60dfb1f832b9ec/1_0_1//Nunalleq.Ledger.2018.lpd')
File https://lipdverse.org/data/LCf20b99dfe8d78840ca60dfb1f832b9ec/1_0_1//Nunalleq.Ledger.2018.lpd does not exist
Loading 0 LiPD files
Conversion to RDF done..
Loading RDFs into graph
Traceback (most recent call last):

  Cell In[7], line 1
    d.load('https://lipdverse.org/data/LCf20b99dfe8d78840ca60dfb1f832b9ec/1_0_1//Nunalleq.Ledger.2018.lpd')

  File ~/Documents/GitHub/pylipd/src/pylipd/lipd.py:94 in load
    rdffile = filemap[lipdfile]

KeyError: 'https://lipdverse.org/data/LCf20b99dfe8d78840ca60dfb1f832b9ec/1_0_1//Nunalleq.Ledger.2018.lpd'

Need to be able to load from a local file as well as URL.

varunratnakar commented 1 year ago

Fixed in https://github.com/LinkedEarth/pylipd/commit/aa6c90f94ba70996e5f1fbb753df7a0ed2f6cbf1

khider commented 1 year ago

When loading from a GitHub url, get the following error:

RemoteTraceback: """ Traceback (most recent call last): File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) File "/Users/deborahkhider/Documents/GitHub/pylipd/pylipd/multi_processing.py", line 28, in convert_to_pickle raise e File "/Users/deborahkhider/Documents/GitHub/pylipd/pylipd/multi_processing.py", line 25, in convert_to_pickle converter.convert(lipdfile, tofile, type="pickle") File "/Users/deborahkhider/Documents/GitHub/pylipd/pylipd/lipd_to_rdf.py", line 77, in convert self._unzip_lipd_file(lipdpath, tmpdir) File "/Users/deborahkhider/Documents/GitHub/pylipd/pylipd/lipd_to_rdf.py", line 96, in _unzip_lipd_file resp = urlopen(lipdfile) File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/urllib/request.py", line 216, in urlopen return opener.open(url, data, timeout) File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/urllib/request.py", line 519, in open response = self._open(req, data) File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/urllib/request.py", line 536, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/urllib/request.py", line 496, in _call_chain result = func(args) File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/urllib/request.py", line 1391, in https_open return self.do_open(http.client.HTTPSConnection, req, File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/urllib/request.py", line 1348, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/http/client.py", line 1282, in request self._send_request(method, url, body, headers, encode_chunked) File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/http/client.py", line 1293, in _send_request self.putrequest(method, url, **skips) File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/http/client.py", line 1131, in putrequest self._output(self._encode_request(request)) File "/Users/deborahkhider/opt/anaconda3/envs/pylipd/lib/python3.10/http/client.py", line 1211, in _encode_request return request.encode('ascii') UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 71: ordinal not in range(128) """ The above exception was the direct cause of the following exception: Traceback (most recent call last): Cell In[2], line 13 d.load(url_github) File ~/Documents/GitHub/pylipd/pylipd/lipd.py:144 in load multi_convert_to_pickle(filemap, collection_id) File ~/Documents/GitHub/pylipd/pylipd/multi_processing.py:34 in multi_convert_to_pickle pool.starmap(convert_to_pickle, args, chunksize=1) File ~/opt/anaconda3/envs/pylipd/lib/python3.10/multiprocessing/pool.py:375 in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File ~/opt/anaconda3/envs/pylipd/lib/python3.10/multiprocessing/pool.py:774 in get raise self._value UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 71: ordinal not in range(128)

To reproduce:

from pylipd.lipd import LiPD

if __name__=="__main__": #works without it but may be slow. 
    d = LiPD()

#add dataset from url
    url_github = 'https://github.com/LinkedEarth/Pyleoclim_util/blob/master/example_data/Arc-LakeNataujärvi.Ojala.2005.lpd?raw=true'
    d.load(url_github)

    ids = d.get_all_dataset_ids()
    ts_list=d.get_timeseries(ids)
varunratnakar commented 1 year ago

Hmm.. the issue here is that the uri contains special characters ä.

varunratnakar commented 1 year ago

Fixed in https://github.com/LinkedEarth/pylipd/commit/cead8c7a97fe882cf60712d68284a5b749c94a60

However, there is another issue in that this seems to have multiple proxies for some variables.. Is that valid ? For now, I'm just using the first proxy if there is a list

khider commented 1 year ago

It might happen if it was a composite.