Azure / MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
https://docs.microsoft.com/azure/machine-learning/service/
MIT License
4.1k stars 2.52k forks source link

dataset.download() Unsupported Linux distribution #1003

Closed Radeju closed 4 years ago

Radeju commented 4 years ago

I am trying to download an AzureML dataset on Ubuntu 20.04. I am using azureml.core library. However, when I try to run it I get following error

  File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 169, in attemp_get_deps
    blob_deps_to_file()
  File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 161, in blob_deps_to_file
    blob = request.urlopen(deps_url, context=ssl_context)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "setup/get_datasets.py", line 27, in <module>
    dataset.download(target_path=f'{path}/../.datasets/{dataset_name}', overwrite=True)
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 106, in wrapper
    return func(*args, **kwargs)
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/file_dataset.py", line 123, in download
    for p in self._to_path(activity='download.to_path')]
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/file_dataset.py", line 98, in _to_path
    dataflow, portable_path = _add_portable_path_column(self._dataflow)
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 106, in wrapper
    return func(*args, **kwargs)
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 203, in _dataflow
    dataprep().api._datastore_helper._set_auth_type(self._registration.workspace)
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py", line 136, in _set_auth_type
    get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.DERIVED, json.dumps(auth)))
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 18, in get_engine_api
    _engine_api = EngineAPI()
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 55, in __init__
    self._message_channel = launch_engine()
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/engine.py", line 300, in launch_engine
    dependencies_path = runtime.ensure_dependencies()
  File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 181, in ensure_dependencies
    if not attemp_get_deps():
  File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 175, in attemp_get_deps
    raise NotImplementedError('Unsupported Linux distribution {0} {1}.{2}'.format(dist, version[0], version[1]))
NotImplementedError: Unsupported Linux distribution ubuntu 20.04
The terminal process terminated with exit code: 1

Are you planning to support 20.04 version of Ubuntu? Is there any roadmap? I found this issue from 6 months ago and would really appreciate to hear if anything had changed since then.

Right now I am using the workaround from here to make it work.

Warm regards

YutongTie-MSFT commented 4 years ago

@Radeju Thanks for the feedback! We are currently investigating and will update you shortly.

MayMSFT commented 4 years ago

thanks for the feedback. We have recorded the feedback and added it as a feature request on our roadmap.

YutongTie-MSFT commented 4 years ago

@MayMSFT Hi May, are we good to close this or you want me to keep it open? Thanks.

xkszltl commented 4 years ago

Hi! I'm planning to switch our pipeline from 18.04 to 20.04 soon as well. Looks like this may be a blocking issue. Do we have timeline regarding the fix?

Based on the log seems distro version is asserted by a whitelist. IMHO this is a bad design which can probably affect a lot of not-so-popular distros like arch or mint.

MayMSFT commented 4 years ago

Unfortunately, it depends on legal approval. @tot0 to share more details

tot0 commented 4 years ago

@xkszltl Hi, I unfortunately don't have any concrete timeline for official support of new linux distros. The legal processes involved distributing open source packages so that normally Datasets 'just works' require care and aren't moving as fast as we'd hope.

Datasets will only return saying 'Unsupported Distro' if the required dependencies for .NET Core 2.1 are not present on default library paths AND a pre-prepared dependency set doesn't exist. We are working on improving the error message to link out to the official .NET Core documentation on how to install the correct dependencies for supported distributions.

@xkszltl Would you be able to try the first command here to install .NET Cores dependencies for Ubuntu 20.04 and see if you're able to use dataset.download()? https://docs.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#2004-

xkszltl commented 4 years ago

Of course, if it is just a matter of installing .NET it's totally fine for us. Actually we will do that regardless of the use of AML Datasets.

Is 2.1 a exact or minimum requirement? Can we use later version? Namely 2.2 or 3+

tot0 commented 4 years ago

Currently Datasets requires .NET Core 2.1

YutongTie-MSFT commented 4 years ago

@Radeju We will now proceed to close this thread. If there are further questions regarding this matter, please respond here and @YutongTie-MSFT and we will gladly continue the discussion.

corticalstack commented 4 years ago

Getting same issue trying to use "from azureml.opendatasets import Diabetes" with error "Unsupported Linux distribution ubuntu 20.04". Tried suggested by @tot0 but didnt resolve: https://docs.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#2004-

@YutongTie-MSFT

corticalstack commented 4 years ago

Had this error again trying to access my own dataset in a storage account blob, error as follows. Code is being run as a local jupyter notebook on Ubuntu 20.04. Code is the "day1-part4-data" notebook: https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/get-started-day1/day1-part4-data.ipynb

which fails on line: dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))

`HTTPError Traceback (most recent call last) ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in attemp_get_deps() 198 try: --> 199 blob_deps_to_file() 200 success = True

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in blob_deps_to_file() 190 ssl_context = ssl.create_default_context(cafile=cafile) --> 191 blob = request.urlopen(deps_url, context=ssl_context) 192 with open(deps_tar_path, 'wb') as f:

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context) 221 opener = _opener --> 222 return opener.open(url, data, timeout) 223

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout) 530 meth = getattr(processor, meth_name) --> 531 response = meth(req, response) 532

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in http_response(self, request, response) 639 if not (200 <= code < 300): --> 640 response = self.parent.error( 641 'http', request, response, code, msg, hdrs)

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in error(self, proto, args) 568 args = (dict, 'default', 'http_error_default') + orig_args --> 569 return self._call_chain(args) 570

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, args) 501 func = getattr(handler, meth_name) --> 502 result = func(args) 503 if result is not None:

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs) 648 def http_error_default(self, req, fp, code, msg, hdrs): --> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp) 650

HTTPError: HTTP Error 404: Not Found

During handling of the above exception, another exception occurred:

NotImplementedError Traceback (most recent call last)

in ----> 1 dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10')) ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/data/_loggerfactory.py in wrapper(*args, **kwargs) 124 with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as al: 125 try: --> 126 return func(*args, **kwargs) 127 except Exception as e: 128 if hasattr(al, 'activity_info') and hasattr(e, 'error_code'): ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/data/dataset_factory.py in from_files(path, validate) 702 from azureml.data import FileDataset 703 --> 704 dataflow = dataprep().api.dataflow.Dataflow._path_to_get_files_block(_validate_and_normalize_path(path)) 705 if validate: 706 _validate_has_data(dataflow, 'Cannot load any data from the specified path. ' ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/dataflow.py in _path_to_get_files_block(path, archive_options) 2423 try: 2424 if _is_datapath(path) or _is_datapaths(path): -> 2425 return datastore_to_dataflow(path) 2426 except ImportError: 2427 pass ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py in datastore_to_dataflow(data_source, query_timeout) 25 datastore_values = [] 26 for source in data_source: ---> 27 datastore, datastore_value = get_datastore_value(source) 28 if not _is_fs_datastore(datastore): 29 raise NotSupportedDatastoreTypeError(datastore) ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py in get_datastore_value(data_source) 78 79 workspace = datastore.workspace ---> 80 _set_auth_type(workspace) 81 return (datastore, DatastoreValue( 82 subscription=workspace.subscription_id, ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py in _set_auth_type(workspace) 141 get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.SERVICEPRINCIPAL, json.dumps(auth))) 142 else: --> 143 get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.DERIVED, json.dumps(auth))) 144 145 ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py in get_engine_api() 17 global _engine_api 18 if not _engine_api: ---> 19 _engine_api = EngineAPI() 20 21 from .._dataset_resolver import register_dataset_resolver ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py in __init__(self) 66 pass 67 ---> 68 self._message_channel = launch_engine() 69 connect_to_requests_channel() 70 ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/engine.py in launch_engine() 331 engine_path = _get_engine_path() 332 try: --> 333 dependencies_path = runtime.ensure_dependencies() 334 except Exception as e: 335 _LoggerFactory.trace(log, 'Failed to ensure dependencies' + str(e)) ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in ensure_dependencies() 211 return success 212 --> 213 if not attemp_get_deps(): 214 # Failed accessing blob, likely an interrupted connection. Try again once more. 215 if not attemp_get_deps(): ~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in attemp_get_deps() 205 err_msg = 'Unsupported Linux distribution {0} {1}.{2}'.format(dist, version[0], version[1]) 206 log_event('ensure_dependencies', error=err_msg, missing_pkgs=list(missing_pkgs)) --> 207 raise NotImplementedError(err_msg) 208 except Exception as e: 209 logger.debug("Exception when accessing blob: " + str(e)) NotImplementedError: Unsupported Linux distribution ubuntu 20.04 `
tot0 commented 4 years ago

Hi @corticalstack, could you try running the below python snippet in your Ubuntu 20.04 environment?

from dotnetcore2 import runtime
runtime._enable_debug_logging()
runtime.ensure_dependencies()

This should reveal what dependencies missing for Datasets.

For installing .NET Core 2.1 ahead of time did you install dotnet-runtime-3.1 or dotnet-runtime-2.1?

Cheers.

corticalstack commented 4 years ago

@tot0 Wrt .NET Core 2.1, I believe it was 3.1 as per: https://docs.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#2004-

Within a Jupyter notebook I added the 3 lines as requested, then executed:

dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))

And got what seems like multiple errors trying to log in DEBUG mode:

DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False DEBUG - Created a static thread pool for ServiceContext class DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - Authority:Performing instance discovery: ... DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - Authority:Performing static instance discovery DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - Authority:Authority validated via static instance discovery DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - TokenRequest:Getting token from cache with refresh if necessary. DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:finding with query keys: {'_clientId': '...', 'userId': '...'} DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Looking for potential cache entries: {'_clientId': '...', 'userId': '...'} DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Found 2 potential entries. DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Resource specific token found. DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Returning token from cache lookup, AccessTokenId: b'ji5H/ccIOfhlbO6LhVa6SPJm1T+uGkOaz40LghSXBzc=', RefreshTokenId: b'WKAoyST6eg+Go79SJMjKcyHKHQ1z1tWx146fEyzlv8M=' DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False

tot0 commented 4 years ago

@corticalstack Hmmm, those RunContext debug logs make sense from the Dataset calls, they shouldn't have happened during the runtime.ensure_depedencies() call. What version of dotnetcore2 is installed in your environment? Would it be possible too just see the outcome of running the 3 lines I shared, and not the from_files call? Thanks!

Unfortunately the .NET Core docs don't have any specific 2.1 advice anymore. The package dotnet-runtime-2.1 does exist though and I recommend installing that instead of dotnet-runtime-3.1.

corticalstack commented 4 years ago

@tot0 version installed is 2.1.15 of dotnetcore2

The only Jupyter output from the 3 lines you shared is as follows: '/home/jp/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/bin/deps'

Thanks

tot0 commented 4 years ago

Ok so if runtime.ensure_dependencies() returns a path like the one you shared that means it all the dependencies exist locally for .NET Core to run. dotnetcore2==2.1.17 is the newest version and upgraded the underlying .NET Core run time to support newer openssl version installed on newer linux distros (Ubuntu 20 included). It has not yet added full support for all the dependencies required on Ubuntu 20 (so the pre install steps via apt-get is still required) but using the newer version of dotnetcore2 should enable Datasets to run on Ubuntu 20.

corticalstack commented 4 years ago

@tot0 pip uninstalled dotnetcore 2.1.15 and installed latest, all good. Thanks!

smougel commented 3 years ago
from dotnetcore2 import runtime
runtime._enable_debug_logging()
runtime.ensure_dependencies()

NotImplementedError: Unsupported Linux distribution ubuntu 20.10

pip install dotnetcore2 Collecting dotnetcore2 Using cached dotnetcore2-2.1.19-py3-none-manylinux1_x86_64.whl (28.7 MB) Requirement already satisfied: distro>=1.2.0 in ./.conda/envs/p8/lib/python3.8/site-packages (from dotnetcore2) (1.5.0) Installing collected packages: dotnetcore2 Successfully installed dotnetcore2-2.1.19

Any idea ?

smougel commented 3 years ago

Issue solved

sudo apt install dotnet-runtime-2.1 The following packages have unmet dependencies: dotnet-runtime-deps-2.1 : Depends: libicu but it is not installable or libicu66 but it is not installable or libicu65 but it is not installable or libicu63 but it is not installable or libicu60 but it is not installable or libicu57 but it is not installable or libicu55 but it is not installable or libicu52 but it is not installable E: Unable to correct problems, you have held broken packages.

1) Install libicu

    wget http://ftp.us.debian.org/debian/pool/main/i/icu/libicu63_63.2-3_amd64.deb
    sudo dpkg -i libicu63_63.2-3_amd64.deb

2) sudo apt install dotnet-runtime-2.1

Don't know if there is a best way to do?