bryevdv commented 4 years ago

The Basics

Service team responsible for the client library: Azure SDK
Link to documentation describing the service:

https://azure.microsoft.com/en-us/services/search/
Contact email (if service team, provide PM and Dev Lead):

Alex Ghiondea (PM) Bryan Van de Ven (Python dev lead)

About this client library

Name of the client library: search
Languages for this review: C#, Java, JavaScript, Python
Link to the service REST APIs:

https://docs.microsoft.com/en-us/rest/api/searchservice/

Artifacts required (per language)

We use an API review tool (apiview) to support .NET and Java API reviews. For Python and TypeScript, use the API extractor tool, then submit the output as a Draft PR to the relevant repository (azure-sdk-for-python or azure-sdk-for-js).

.NET

Upload DLL to apiview. Link: Azure.Search
Link to samples for champion scenarios:

Java

Upload JAR to apiview. Link: azure-search
Link to samples for champion scenarios:

Python

Upload the api as a Draft PR. Link to PR:

https://github.com/Azure/azure-sdk-for-python/pull/9983
Link to samples for champion scenarios:

https://gist.github.com/bryevdv/48323738f27bedb0c5d0d31246e17041#scenarios

JavaScript / TypeScript

Upload output of api-extractor as a Draft PR. Link to PR:

https://github.com/Azure/azure-sdk-for-js/pull/7614

Link to samples for champion scenarios:

https://gist.github.com/xirzec/17c8192e41a8cd40ceba82c5c39339f2#scenarios

Champion Scenarios

The context of all the following scenario is:

Customer runs a web store. As part of the store, they have a product search capability.

Search for a documents in an Azure Search index with simple text, and get the first result

Customer may wish to perform a query with basic search text and obtain the first result returned to display to their users.

Target audience: basic developers with general needs (~90%)
Code samples: .NET | Java | Python | TypeScript

Filter search results from an Azure Search index

Customers may wish to afford the option to filter search results by specific conditions, order the results in specified ways, or limit returned results to a subset of fields.

Target audience: basic developers with general needs (~90%)
Code samples: .NET | Java | Python | TypeScript

Get a list of search suggestions

Customers may wish to ask for a list of search suggestions based on a given search text, in order to guide users search experience.

Target audience: experienced developers with general needs (~50%)
Code samples: .NET | Java | Python | TypeScript

Upload documents to an Azure Search index

Customers may wish to add new documents to a search index (being informed if a given document already exists), e.g. to represent new products they are making available that, that their users should able to find.

Target audience: basic developers with general needs (~90%)
Code samples: .NET | Java | Python | TypeScript

Merge or upload a document in an Azure Search index (with a new field)

Customers may have a set of documents, which may or may not already exist in the search index, that they wish to incorporate into the index, .e.g if changes to existing product descriptions need to be made available to their users.

Target audience: basic developers with general needs (~90%)
Code samples: .NET | Java | Python | TypeScript

Batch CRUD operations on documents

Customer may wish to mass-update many documents of their catalogue at once, in the most efficient way possible.

Target audience: experienced developers with general needs (~50%)
Code samples: .NET | Java | Python | TypeScript

Agenda for the review

A board review is generally split into two parts, with additional meetings as required

Part 1 - Introducing the board to the service:

Review of the service (no more than 10 minutes).
Review of the champion scenarios.
Get feedback on the API patterns used in the champion scenarios.

After part 1, you may schedule additional meetings with architects to refine the API and work on implementation.

Part 2 - the "GA" meeting

Scheduled at least one week after the APIs have been uploaded for review.
Will go over controversial feedback from the line-by-line API review.
Exit meeting with concrete changes necessary to meet quality bar.

Thank you for your submission

adrianhall commented 4 years ago

Scheduled for Wednesday Mar-4

adrianhall commented 4 years ago

Champion Scenarios:

In .NET, the Models namespace contains all the models; @KrzysztofCwalina wants the input models to be moved up to the top-level namespace. @tg-msft disagrees. ACTION: Ted & Krzyzstof will take it offline and maybe update the .NET guidelines.
Whats the advantage (.NET) on passing strings as indices to the search - a lot of customers don't know the schema at design time.
- Does it scale to nested object types? Yes, it does.

TABLED: Customizable JSON serializer within .NET (to support JSON.NET)

Why is OrderBy a string? Can totally change it to a Collection?
- In Java, Select is also a collection. ACTION: Investigate formattable strings for the Odata filter in python and java ACTION: Ensure select and orderBy are collections of strings in every language.
When suggesting, do you expect users to iterate through all values, or index into them?
- Expect users to iterate through all values.
Why would we not call SearchIndexClient - why not SearchClient?
- Depends on point of view. There is also a SearchServiceClient, which has more mgmt ops.
- It's named relative to what the customer is operating on - the service or the index. ACTION: Noodle on the naming of SearchIndexClient vs. SearchClient
Uploading (convenience)
- If we have to search by string for the key, it's going to be an expensive operation.
- this is what we get back from the service, so we are going to spend the cost either way
- We can massage on the client side to ensure the data is returned in the same order as submitted ACTION: Cross-language, ensure uploadDocuments array of errors/success is in the same order as the data submitted.
Python has a suggested upsert instead of merge_or_upload - should we adjust?
.NET is thinking of a higher-level batching API for convenience, but it isn't here yet.
Can they upload a stream? Is the data big enough?
- The service has a mechanism for uploading from blob storage. This API is really for live data. ACTION @KrzysztofCwalina will suggest something for this.
On the JS conditional access, can we use the same thing for ETags? ETags is coming later, so concern is burning the properties when the service does introduce ETags.
- Maybe two separate methods for update / merge-or-upload?
- Sounds like we have situation for semantically meaningful nulls. ACTION: Feature team will investigate semantically meaningful null and naming of updateDocuments here. (See @johanste if confused)
The IndexBatch (Java) is not ideal.
index, search, and document are all too generic. ACTION: Feature team to reconcile naming. RECOMMENDATION: Remove the s after IndexDocument(s)?
Search is paged. However, result set contains additional metadata that is repeated on every page.
- could we expose pageable, then say the information that doesn't change is on the pageable.
- Python makes the results the first class citizen. .NET is one level down.
- What isn't obvious with Python is how to get the metadata for the collection.
- Potentially inherit from AsyncPageable, with the metadata on the sub-type.
- Clients always have to be prepared for paging.
- this is a redesign of the pageable mechanism. ACTION: @KrzysztofCwalina will work with Feature team to avoid paging redesign.
It sounds like client paging is relatively well known in the customer base, so maybe the server driven paging is just hidden behind the scenes?
Service team - accessing facets is easy from the customer point of view.
In track 1, there were operation groups to try and divide the client. In track-2, there are more methods because operation groups are gone. Should we re-introduce operation groups, or multiple clients?
- Are they common operations?
- Maybe make it look like LeaseClient from Storage? (e.g. dot into the operation)

ACTION: Work on the case of partial success/failure. @brjohnstmsft has details on how it worked in track-1 & Java.

Is the service sensitive to Unicode encoding? (e.g. 1 byte version of e-acute vs. the 2-byte composed version).
- Should we normalize before passing to the service?
- Normalization might be expensive. ACTION: @brjohnstmsft will check on this with the experts on the team.

Recording: https://msit.microsoftstream.com/video/1cb4f2c1-0cf4-4633-aee4-7e90ec9e6501

bterlson commented 4 years ago

@xirzec and I discussed a possible alternative to the PagedAsyncIterable approach.

First, some possible guideline modifications: We are thinking that PagedAsyncIterable isn't appropriate if the following are true:

you usually want pages, because PagedAsyncIterable puts per-item iterators front-and-center, and hides pages behind byPage().
you usually just want the first page.

1 is likely the case because users will often want facets, and #2 is likely true because we've heard from the service that most result sets fit in a single response.

Therefore, we could consider an approach that hides the server-driven paging entirely (or, as an "advanced API") and instead expose a more typical client-driven paging model. Users pass skip and top, and if a user requests top N where N is larger than what fits in a single response, we will request the next page for them (which, we've heard, will be rare in practice).

tg-msft commented 4 years ago

Search Paging

We had a chat to follow up on the discussion from https://github.com/Azure/azure-sdk/issues/1041 and the comments from TS folks at https://github.com/Azure/azure-sdk/issues/1041#issuecomment-594941663.

The TS folks suggested an inversion of the async paging approach we've taken for other services as the default given that most users want a single page. Krzysztof doesn't want to invent new patterns, but felt the key for his understanding was considering this to be a tuple of facet info and search results. Bruce helped us understand some of the pain points between client-side paging (which we want to make easy) and server-side paging (which ideally we'll hide as much as possible as an implementation detail). We threw out a lot of ideas and settled here. There was some concern about the "double await" but nobody thought customers would get hung up there.

C# and TS will try to switch to this approach for Preview 1. Java and Python can wait for a future preview.

C

We settled on something pretty close to what's already in the ApiView listing. We'll have APIs that look like:

public Response<SearchResults> Search(...);
public async Task<Response<SearchResults>> SearchAsync(...);

public class SearchResults<T>
{
    public IDictionary<string, ICollection<FacetResult>> Facets { get; }
    public double? Coverage { get; }
    public long? TotalCount { get; }

    public AsyncPageable<SearchResult<T>> GetResultsAsync();
    public Pageable<SearchResult<T>> GetResults();
}

And would use that via:

var results = await client.SearchAsync(...);
// render results.Facets
await foreach (var item in results.GetResultsAsync())
{
    // render search item
}

TS

The same approach would look like this in in TS:

// proposal one:
const result = await client
  .search({
    searchText: "WiFi",
    facets: ["Category,count:3,sort:count", "Rooms/BaseRate,interval:100"]
  });

console.log(result.facets);

// result.results() returns an PagedAsyncIterator of SearchResult, SearchResult[]
for await (const searchResult of result.results()) {
    // do something with searchResult
    console.log(searchResult);
}

xirzec commented 4 years ago

The above is now implemented in Azure/azure-sdk-for-js#7641 with one minor change: result.results is the iterator itself, not a method to return an iterator:

for await (const searchResult of result.results) {
    // do something with searchResult
    console.log(searchResult);
}

Azure / azure-sdk

Board Review: Search #1041

The Basics

About this client library

Artifacts required (per language)

.NET

Java

Python

JavaScript / TypeScript

Champion Scenarios

Search for a documents in an Azure Search index with simple text, and get the first result

Filter search results from an Azure Search index

Get a list of search suggestions

Upload documents to an Azure Search index

Merge or upload a document in an Azure Search index (with a new field)

Batch CRUD operations on documents

Agenda for the review

Thank you for your submission

1 is likely the case because users will often want facets, and #2 is likely true because we've heard from the service that most result sets fit in a single response.

Search Paging

C

TS

Azure / azure-sdk

Board Review: Search #1041

The Basics

About this client library

Artifacts required (per language)

.NET

Java

Python

JavaScript / TypeScript

Champion Scenarios

Search for a documents in an Azure Search index with simple text, and get the first result

Filter search results from an Azure Search index

Get a list of search suggestions

Upload documents to an Azure Search index

Merge or upload a document in an Azure Search index (with a new field)

Batch CRUD operations on documents

Agenda for the review

Thank you for your submission

​1 is likely the case because users will often want facets, and #​2 is likely true because we've heard from the service that most result sets fit in a single response.

Search Paging

C

TS

1 is likely the case because users will often want facets, and #2 is likely true because we've heard from the service that most result sets fit in a single response.