Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.62k stars 2.83k forks source link

Azure Search FieldMappingFunctions not being called or applied (similar to #33348) #34767

Closed dheeraj-sachdeva closed 8 months ago

dheeraj-sachdeva commented 8 months ago

Describe the bug the fieldmapping function is not getting applied and the indexer runs fine without any error but also does not apply any mapping to the index field and it remains null. My container is cl-container that has 2 folders: uploads and user-uploads. user-uploads has further subfolders of user-guid and those then have user uploaded files. So the document structure is cl-container/user-uploads/userguid1/file1.pdf and cl-container/user-uploads/userguid2/file1.pdf (for example if both userguid1 and userguid2 were to upload the same file1.pdf individually) My datasource definition defines the folder/query as "user-uploads". My indexer definition in the python code is:

source_field = f"/document/user-uploads/{userguid}" == metadata_storage_path ??

we need to extract {userguid} from the metadata_storage_path where it appears at position 3, I think

indexer = SearchIndexer( name=AZURE_STORAGE_USERDATA_INDEXER_NAME, data_source_name=AZURE_STORAGE_USER_DATASOURCE, target_index_name=AZURE_SEARCH_INDEX_NAME, skillset_name=CL_SKILLSET, schedule=None, parameters=None, field_mappings=[ FieldMapping( source_field_name="metadata_storage_path", target_field_name="owner", mapping_function=FieldMappingFunction( name="extractTokenAtPosition", parameters={ "delimiter": "/", "position": 3 } ) ) ], output_field_mappings=[], is_disabled=False, e_tag=None, encryption_key=None ) The "owner" index field is of type Edm.String.

When the code runs, I print the indexer.serialize() output: in create_indexer for user guid: 5c7e5dc7-171e-4457-bea4-45d4d9b16ed4 Serialized Search Indexer: {'name': 'corelab-userdata-indexer', 'dataSourceName': 'corelab-userfiles-datasource', 'skillsetName': 'corelab-vector-index-skillset', 'targetIndexName': 'corelab-vector-index', 'fieldMappings': [{'sourceFieldName': 'metadata_storage_path', 'targetFieldName': 'owner'}], 'outputFieldMappings': [], 'disabled': False} Search Indexer created with in-line field mapping: {'additional_properties': {}, 'name': 'corelab-userdata-indexer', 'description': None, 'data_source_name': 'corelab-userfiles-datasource', 'skillset_name': 'corelab-vector-index-skillset', 'target_index_name': 'corelab-vector-index', 'schedule': None, 'parameters': None, 'field_mappings': [<azure.search.documents.indexes._generated.models._models_py3.FieldMapping object at 0x0000028880172150>], 'output_field_mappings': [], 'is_disabled': False, 'e_tag': None, 'encryption_key': None}

search indexer updated with owner guid

{'additional_properties': {'@odata.context': 'https://searchkb.search.windows.net/$metadata#indexers/$entity'}, 'name': 'corelab-userdata-indexer', 'description': None, 'data_source_name': 'corelab-userfiles-datasource', 'skillset_name': 'corelab-vector-index-skillset', 'target_index_name': 'corelab-vector-index', 'schedule': None, 'parameters': None, 'field_mappings': [<azure.search.documents.indexes._generated.models._models_py3.FieldMapping object at 0x00000288801CE510>], 'output_field_mappings': [], 'is_disabled': False, 'e_tag': '"0x8DC43C63EAD9A25"', 'encryption_key': None}

search indexer kicked off When I query the index and find the records of the file uploaded, the owner field is set to null and not the guid value (5c7e5dc7-171e-4457-bea4-45d4d9b16ed4) which was expected.

Hopefully you can look into this and help resolve this?

To Reproduce Steps to reproduce the behavior: -- None --

Expected behavior Index field "owner" is set to value "5c7e5dc7-171e-4457-bea4-45d4d9b16ed4" that is extracted from container folder name: cl-container/user-uploads/5c7e5dc7-171e-4457-bea4-45d4d9b16ed4/*

Screenshots If applicable, add screenshots to help explain your problem. -- None --

Additional context I have 1 index, 2 indexers, 2 datasources and 2 indexers writing into the same index. The 2nd indexer uses different datasource container folder: cl-container/uploads/* The owner field in index is set as null. This issue is similar to #33348 which was closed without resolution as the original poster stopped responding.

github-actions[bot] commented 8 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @bleroy @markheff @miwelsh @tjacobhi.

xiangyan99 commented 8 months ago

Thanks for the feedback.

It seems like the request was sent correctly, please open a service ticket to involve the service team to take a look.

github-actions[bot] commented 8 months ago

Hi @dheeraj-sachdeva. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

dheeraj-sachdeva commented 8 months ago

Hi! How do I go about opening the service ticket to have the service team look into this?

xiangyan99 commented 8 months ago

https://azure.microsoft.com/support/create-ticket

dheeraj-sachdeva commented 8 months ago

Ok thanks! Already did that: TrackingID#2403140010003290