Data source not created until new content is added

Examine Version: 4.0.0-beta 1 Umbraco Version: 10.0.0.1

Description It looks like the data sources are not created until a new image is uploaded to the site. At this point I got an error on the logs.

InternalIndex An error occurred processing the index batch.
Azure.RequestFailedException: Existing field 'content' cannot be changed.
Status: 400 (Bad Request)
ErrorCode: OperationNotAllowed

Content:
{"error":{"code":"OperationNotAllowed","message":"Existing field 'content' cannot be changed.","details":[{"code":"CannotChangeExistingField","message":"Existing field 'content' cannot be changed."}]}}

Headers:
Cache-Control: no-cache
Pragma: no-cache
client-request-id: 3d6101b1-e8cd-4377-b146-a873ef5db4b6
x-ms-client-request-id: 3d6101b1-e8cd-4377-b146-a873ef5db4b6
request-id: 3d6101b1-e8cd-4377-b146-a873ef5db4b6
elapsed-time: 73
Preference-Applied: REDACTED
Strict-Transport-Security: REDACTED
Date: Mon, 22 Aug 2022 05:48:31 GMT
Content-Type: application/json; charset=utf-8
Content-Language: REDACTED
Expires: -1
Content-Length: 201

   at Azure.Search.Documents.IndexesRestClient.CreateOrUpdate(String indexName, SearchIndex index, Nullable`1 allowIndexDowntime, String ifMatch, String ifNoneMatch, CancellationToken cancellationToken)
   at Azure.Search.Documents.Indexes.SearchIndexClient.CreateOrUpdateIndex(SearchIndex index, Boolean allowIndexDowntime, Boolean onlyIfUnchanged, CancellationToken cancellationToken)
   at ExamineX.AzureSearch.AzureSearchIndex.a(IEnumerable`1 )
   at ExamineX.AzureSearch.AzureSearchIndex.A(IEnumerable`1 , CancellationToken )
   at ExamineX.AzureSearch.AzureSearchIndex.E.A(Task )

If I upload a new image after this, I get a new error, but not the previous one anymore:

An error occurred adding/updating the azure search indexer test-external
Azure.RequestFailedException: Data source does not contain column '__NodeId', which is required because it maps to the document key field 'x__NodeId' in the index 'test-external'. Ensure that the '__NodeId' column is present in the data source, or add a field mapping that maps one of the existing column names to 'x__NodeId'.
Status: 400 (Bad Request)

Content:
{"error":{"code":"","message":"Data source does not contain column '__NodeId', which is required because it maps to the document key field 'x__NodeId' in the index 'test-external'. Ensure that the '__NodeId' column is present in the data source, or add a field mapping that maps one of the existing column names to 'x__NodeId'."}}

Headers:
Cache-Control: no-cache
Pragma: no-cache
client-request-id: 09479079-eccf-4897-8475-a3807742aeac
x-ms-client-request-id: 09479079-eccf-4897-8475-a3807742aeac
request-id: 09479079-eccf-4897-8475-a3807742aeac
elapsed-time: 259
Preference-Applied: REDACTED
Strict-Transport-Security: REDACTED
Date: Mon, 22 Aug 2022 06:04:52 GMT
Content-Type: application/json; charset=utf-8
Content-Language: REDACTED
Expires: -1
Content-Length: 330

   at Azure.Search.Documents.IndexersRestClient.CreateOrUpdateAsync(String indexerName, SearchIndexer indexer, String ifMatch, String ifNoneMatch, CancellationToken cancellationToken)
   at Azure.Search.Documents.Indexes.SearchIndexerClient.CreateOrUpdateIndexerAsync(SearchIndexer indexer, Boolean onlyIfUnchanged, CancellationToken cancellationToken)
   at ExamineX.AzureSearch.Umbraco.BlobMedia.BlobStorageIndexerRunTask.RunAsync(SearchIndexerClient indexerClient, SearchIndexer indexer, CancellationToken token)

Hi,

The datasources will be created when either the index is rebuilt or lazily if you have already installed ExamineX and then install the blob media package and then content/media is saved. Unfortunately the "content" field must be configured correctly to make this work and it doesn't seem that you can change the field that blob content gets indexed too. I have tried many different avenues to make that work but could never succeed. ExamineX blob media package will try to change any existing "content" field to be compatible with the blob datasource since it must be configured as a SearchFieldDataType.String and not a string collection (which is typically the default).

I would advise that the easiest thing to do is avoid having a field named "content" in your Umbraco property types and leave this as a reserved field name for Azure Search's blob media.

The 2nd error you are getting: Data source does not contain column '__NodeId', which is required because it maps to the document key field 'x__NodeId' in the index 'test-external'. Ensure that the '__NodeId' column is present in the data source, or add a field mapping that maps one of the existing column names to 'x__NodeId'. is because you have blob items in your data store that don't have the metadata fields __NodeId assigned to them. ExamineX will automatically add these when media is saved but if there's existing media that don't have these, when the indexer is enabled you will encounter this issue. Unfortunately there is no easy way around this apart from either starting from scratch with your blob container or re-saving all of your media.

I have just noticed however, that the blob data source configured by ExamineX isn't configuring it to just look in the media folder (which it should) and instead it is querying all of the files in the full container. You can fix this in the portal by adding the media path:

I'll update the codebase to ensure this is done too based on the path configured (default is media in umbraco)

SDKits / ExamineX

Data source not created until new content is added #68