Open mh-hassan18 opened 4 months ago
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.
Hi @mh-hassan18 - Thanks for the detailed info. We'll take a look asap!
@mh-hassan18 - So far, I am unable to see the problem in your code, using the same code I am able to fetch the data. few things to recheck in first method your uri format should be like below uri = "azureml://subscriptions/sub_guid (xxxxxxx)/resourcegroups/achauxxxx/workspaces/workspace-name/datastores/datastore-name/paths/filename" df = pd.read_csv(uri) df
in second print the data_asset.path and it should match with the same as above path azureml://subscriptions/sub_guid (xxxxxxx)/resourcegroups/achauxxxx/workspaces/workspace-name/datastores/datastore-name/paths/filename ml_client = MLClient(credential, subscription_id, resource_group, workspace_name=workspace_name) data_asset = ml_client.data.get("one_data_set", version="1")
df1 = pd.read_table(data_asset.path)
@achauhan-scc
Thank you for your response.
I confirm that my uri is exactly in the same format as you specified. (As I am using the same code from "copy_usage_code" from portal UI as I specified above).
In the second method when I print data_asset.path it matches with the above uri.
But still I am getting the same issue as I described in my original question.
@achauhan-scc
Here is another update. Earlier, I was using python 3.10 - SDK V2 kernel. Now I tried the same code after switching to python 3.8 - AzureML kernel and it's working fine for the datastore that I created for files section of my Fabric Lakehouse.
But using same python 3.8 - AzureML kernel when loading anything from the datastore that I created for tables section of my fabric lakehouse, I am getting following error:
"ValueError: No objects to concatenate"
So to summarize:
Note: In both kernels, I am using azure-ai-ml: 1.18.0
@mh-hassan18 - needs to update the versions of azureml-dataprep-rslex
=2.19.2 and azureml-dataprep = 4.12.1 as the versions they are using (rslex 2.18.3 and dataprep 4.11.3) do not contain the changes to handle OneLake datastores when materializing it into a pandas dataframe.
Hi @achauhan-scc thank you for the support.
I updated the versions for the above two libraries and every thing worked fine for python 3.10 - SDK V2 kernel.
But I have got one more issue. Surprisingly, today when I tested the same script with python 3.8 - AzureML kernel, it's not working. The same script was working fine the other day with this kernel as I described in my last comment. But today when executing the following script with python 3.8 - AzureML kernel,
import pandas as pd
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
uri = "uri_here"
df = pd.read_csv(uri)
df
I am getting following error:
ImportError: Unable to load filesystem from EntryPoint(name='azureml', value='azureml.fsspec.AzureMachineLearningFileSystem', group='fsspec.specs')
I am getting the same issue as above with the following code as well (again with python 3.8 - AzureML kernel):
import pandas as pd
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
data_asset = ml_client.data.get("FABRIC_BRAZIL_PROMOTION_2", version="1")
print(data_asset.path)
df = pd.read_csv(data_asset.path)
df
Note: I have not changed anything in python 3.8 - AzureML kernel and here are the library versions in this kernel: azureml-dataprep = 4.12.1 azureml-dataprep-rslex = 2.19.2 azure-ai-ml = 1.18.0
@achauhan-scc Any update on this?
Problem Description I have created onelake datastore with AzureML following this documentation. Actually I have created a datastore for files section of my lakehouse in Fabric. The datastore was created successfully and I am able to see all data in AzureML through browse mode of datastore. As you can see below:
Now when I try to load some csv file from the datastore using the already provided code i.e. when you go to datastore browse -> csv file -> copy usage code. The code does not work.
Here is the usage code:
The above code does not work and gives following error
I even tried registering the csv file as a data asset from the portal and then used the consume code from registered asset but that also did not work.
Here is from where I registered the csv file as a data asset.
Here is from where I copied the consume code of registered data asset.
Here is the code that I copied from "consume" tab of registered data asset.
But the above code also does not work and gives following error:
To Reproduce Steps to reproduce the behavior:
Expected behavior We should be able to load the data from onelake datastire by using provided "consume/usage" code in the portal. This works for other types of datastores and should work for onelake datastore as well.
Additional context We also created one datastore for tables section of our lakehouse in Fabric. The datastore was created successfully and we were able to browse the data from portal but as far as consuming or loading data is concerned we faced the same issues as we faced for the datastore of files section of lakehouse. The documentation says "At this time, Machine Learning supports connection to Microsoft Fabric lakehouse artifacts in "Files" folder that include folders or files and Amazon S3 shortcuts. " But we were able to create datastore for tables section as well, so tables section is also supported now or not?