Closed chamolag closed 1 year ago
@chamolag Thanks for your feedback! We will investigate and update as appropriate.
@chamolag It seems like you are looking for a way to handle recently uploaded files in Azure Data Lake Storage and implement an incremental load approach based on their last modified date. I understand that the Microsoft Spark Utilities library has limited support for file attributes, and you have noticed a new attribute added to the file properties recently, but it returns only a numeric digit which you couldn't convert into date format. You have also referred to a Databricks feature that allows fetching file last modified date.
I would like to inform you that Microsoft Fabric does not have a similar functionality as the Databricks feature you mentioned. However, you can use the Azure Data Factory to copy data from Azure Data Lake Storage to Azure Synapse Analytics and use the "modifiedDate" column to filter the recently uploaded files. You can also use the "Get Metadata" activity in Azure Data Factory to get the metadata of the files in Azure Data Lake Storage, including the last modified date, and store it in a variable. Then, you can use this variable to filter the files based on their last modified date.
If there are any further questions regarding the documentation, please tag me in your reply and we will be happy to continue the conversation.
Thank you for your quick reply. Yes, I am utilizing Azure Data Factory to address this problem. However, I generally find myself more at ease using the Spark notebook due to its higher level of flexibility in comparison to Azure Data Factory.
When can we anticipate the availability of this feature in Fabric notebook?
@chamolag Update release info:Thank you for your time and patience! Unfortunately, we don't provide product release updates through our GitHub channels. For this, I'd recommend referring to our Azure Updates or Microsoft 365 Roadmap pages for this info.
Thanks for your response!
@chamolag We are going to close this thread, if there are any further questions regarding the documentation, please tag me in your reply and we will be happy to continue the conversation.
There is a requirement to handle only recently uploaded files (Azure Data Lake Storage), implementing an incremental load approach, and then storing the transformed data into designated Lakehouse tables. This task necessitates considering the files based on their last modified date. Although the Microsoft Spark Utilities library offers built-in functions for reading files from the storage location, it has limited support for file attributes.
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-python
I have noticed a new attribute added to the file properties recently, which was not present in the mssparkutils library before. However, it returns only a numeric digit which I couldn’t convert into date format.
Please refer below link for Databricks feature (file metadata) missing in MS Fabric. It allows us to fetch file last modified date. https://learn.microsoft.com/en-us/azure/databricks/ingestion/file-metadata-column
I am expecting the similar functionality (file attribute) in Microsoft Fabric while using notebook.
Document Details
⚠ Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.