SDKits / ExamineX

Issue tracker for ExamineX
https://examinex.online
5 stars 0 forks source link

Unable to use Azurite or local blob storage with Umbraco.StorageProviders.AzureBlob for Media indexing #84

Closed chrden closed 1 year ago

chrden commented 1 year ago

When using the Umbraco.StorageProviders.AzureBlob package in order to index media, the indexing fails if the connection string to the blob storage does not include HTTPS and therefore requires the user to setup a Blob Storage service in Azure to test this locally.

The below exception occurs when attempting to reindex while using Azurite or UseDevelopmentStorage=true with Umbraco.StorageProviders.AzureBlob.

Scenario A:

Scenario B:

Exception when attempting to reindex

Azure.RequestFailedException: HTTPS is required in the storage connection string.
Status: 400 (Bad Request)

Content:
{"error":{"code":"","message":"HTTPS is required in the storage connection string."}}

Headers:
Cache-Control: no-cache
Pragma: no-cache
client-request-id: 0b085bc5-effa-4443-b243-4a8962bd5bc7
x-ms-client-request-id: 0b085bc5-effa-4443-b243-4a8962bd5bc7
request-id: 0b085bc5-effa-4443-b243-4a8962bd5bc7
elapsed-time: 10
Preference-Applied: REDACTED
Strict-Transport-Security: REDACTED
Date: Mon, 05 Jun 2023 04:15:34 GMT
Content-Type: application/json; charset=utf-8
Content-Language: REDACTED
Expires: -1
Content-Length: 85

   at Azure.Search.Documents.DataSourcesRestClient.CreateOrUpdate(String dataSourceName, SearchIndexerDataSourceConnection dataSource, String ifMatch, String ifNoneMatch, CancellationToken cancellationToken)
   at Azure.Search.Documents.Indexes.SearchIndexerClient.CreateOrUpdateDataSourceConnection(SearchIndexerDataSourceConnection dataSourceConnection, Boolean onlyIfUnchanged, CancellationToken cancellationToken)
   at ExamineX.AzureSearch.Umbraco.BlobMedia.BlobStorageComponent.A(Object, CreatingOrUpdatingIndexerEventArgs)
   at ExamineX.AzureSearch.AzureSearchIndex.A(Boolean, SearchIndex, SearchIndexer, SearchIndex& , SearchIndexer& )
   at ExamineX.AzureSearch.AzureSearchIndex.D()
   at ExamineX.AzureSearch.AzureSearchIndex.a(Boolean)
   at ExamineX.AzureSearch.AzureSearchIndex.CreateIndex()
   at ExamineX.AzureSearch.Umbraco.UmbracoAzureSearchIndex.CreateIndex()
   at Umbraco.Cms.Infrastructure.Examine.ExamineIndexRebuilder.RebuildIndex(String indexName, TimeSpan delay, CancellationToken cancellationToken)
   at Umbraco.Cms.Infrastructure.Examine.ExamineIndexRebuilder.<>c__DisplayClass9_1.<RebuildIndex>b__1()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at Umbraco.Cms.Infrastructure.HostedServices.QueuedHostedService.BackgroundProcessing(CancellationToken stoppingToken)

Let me know if you need any more information

:+1:

Shazwazza commented 1 year ago

The problem is that this is trying to configure the Azure Search Data Source for the Azure Search Indexer source in Azure, with a connection string that is local. This cannot work because Azure Search cannot connect to your local azure blob storage.

The way this package works is by a pull mechanism, not a push mechanism. ExamineX doesn't read or parse any media, this is done by Azure Search directly using a Data Source and an Indexer. Behind the scenes, whenever media is updated in Umbraco, ExamineX will tag that blob item with a NodeId metadata attribute and will tell the Azure Search Indexer to start indexing anything in its Data Source that hasn't been processed. The Data Source in this case is blob storage where the media files are stored. Azure Search's Indexer will then read from that Data Source, do all of the file extraction (and it supports many document types, including all office and PDF formats) and then pull that content into the index based on that blob items NodeId metadata.

Shazwazza commented 1 year ago

I will close this issue since this cannot be a supported scenario. If you are working locally, then it's normally best to disable ExamineX and use the standard Examine/Lucene implementation instead. If you require PDF support in that scenario, then you'd need to install this package https://our.umbraco.com/packages/website-utilities/umbracoexaminepdf/