Azure-Samples / azure-cognitive-search-blob-metadata

This sample demonstrates how to use multiple indexers in Azure Cognitive Search to create a single search index from files in Blob storage with their associated metadata in Table storage
MIT License
5 stars 1 forks source link

metadata_storage_path contains #1

Open BobbyAxelrods opened 6 months ago

BobbyAxelrods commented 6 months ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Run the loop to create random metadata for each blob in table storage and you will receive the error and recheck the metadata created in table storage again.

# for BLOBNAME in $BLOBNAMES; do
# Fetch the full blob URL.
# BLOBURL=$(az storage blob url --connection-string $STORAGEACCOUNTCONNECTIONSTRING --container-name $STORAGEACCOUNTCONTAINERNAME --name $BLOBNAME --output tsv | tr -d '\r\n' | sed 's/%0D//g') 
# Set the custom "author" metadata value (generated at random) directly on the blob itself.
AUTHOR=${AUTHORS[$RANDOM % ${#AUTHORS[@]}]}
az storage blob metadata update --connection-string $STORAGEACCOUNTCONNECTIONSTRING --container-name $$STORAGEACCOUNTCONTAINERNAME --name $BLOBNAME --metadata "author=$AUTHOR"

# Upload the "document_type" and "business_impact" metadata values to table storage, separate from the blob.
# The "metadata_storage_path" column in the table is used
# Any partition key can be used here, we use the storage account name for this sample.
PARTITIONKEY=$STORAGEACCOUNTCONTAINERNAME
# Any row key can be used here, as long as it is unique within the partition.
# For this sample, we base64 encode the blob URL so it can be used as a unique row key.
# See https://learn.microsoft.com/rest/api/storageservices/understanding-the-table-service-data-model#characters-disallowed-in-key-fields.
ROWKEY=$(echo $BLOBURL | base64 --wrap=0)
# The "document_type" is set to the containing folder.
DOCUMENTTYPE=${BLOBNAME%/*} # Take the folder path of the blob name (i.e. everything before the last '/') as the document type
# The "business_impact" is generated at random.
BUSINESSIMPACT=${BUSINESSIMPACTS[$RANDOM % ${#BUSINESSIMPACTS[@]}]}
# Create a row in the table with the (unencoded) "metadata_storage_path" for the document key as well as all the metadata to be added to the search index.
az storage entity insert --connection-string $STORAGEACCOUNTCONNECTIONSTRING --table-name $STORAGEACCOUNTTABLENAME --entity PartitionKey=$PARTITIONKEY RowKey=$ROWKEY metadata_storage_path=$BLOBURL document_type=$DOCUMENTTYPE business_impact=$BUSINESSIMPACT done

check the metadata in table storage

az storage entity show --connection-string $STORAGEACCOUNTCONNECTIONSTRING --table-name $STORAGEACCOUNTTABLENAME --partition-key $PARTITIONKEY --row-key $ROWKEY

Any log messages given by the failure

during running the loop , this error appear when each metadata created. The requested URI does not represent any resource on the server. RequestId:3dcfeea3-401e-005b-7b9b-a6f6c2000000 Time:2024-05-15T07:46:14.8469053Z ErrorCode:InvalidUri

Expected/desired behavior

The metadata_storage_path contain unwanted character and its consistent (%0D\r) at the end of each metadata. "metadata_storage_path": "https://cognitivepocv100storage.blob.core.windows.net/samplefiles/role_library.pdf%0D\r"

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) Window 11 by using WSL

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

jelledruyts commented 3 months ago

Hi @BobbyAxelrods, apologies for the delay but I've just tried to execute the steps in the deploy.azcli file again and everything still works as expected for me. Have you executed everything in order? Any special files you added to the sample files directory?