NuGet / NuGetGallery

NuGet Gallery is a package repository that powers https://www.nuget.org. Use this repo for reporting NuGet.org issues.
https://www.nuget.org/
Apache License 2.0
1.54k stars 644 forks source link

[Azure Search] Add ID prefix matching for search #7128

Closed shishirx34 closed 4 years ago

shishirx34 commented 5 years ago

Visual Studio Package Manager UI searches when you pause typing in the search box. Thus, intermediate searches should be pretty good too.

Target queries Expected results
newton, new, n Newtonsoft.Json
s System.*, AWSSDK.S3, Serilog, Swashbuckle
mi, micro Microsoft.Extensions.*, Micorsoft.AspNetCore.*,Microsoft.AspNet.*,Micrsoft.Owin,Microsoft.*
en, e, enti, ent EntityFramework, Microsft.EntityFrameworkCore
m Moq, System.Memory, Microsoft.*
log4 log4net
a AutoMapper, Antlr, AWSSDK.*, Autofac
swash, swa Swashbuckle.AspNet.Core.Swagger, Swashbuckle, Swashbuckle.*
loic-sharma commented 5 years ago

Likely uses same analyzer as from https://github.com/nuget/nugetgallery/issues/7129

loic-sharma commented 5 years ago
Analysis of queries with no results, and fuzzy vs prefix search Data analysis: * ~7.4% of "legacy" nuget.org queries had no results ([LINK](https://ms.portal.azure.com#@72f988bf-86f1-41af-91ab-2d7cd011db47/blade/Microsoft_Azure_Monitoring_Logs/LogsBlade/resourceId/%2Fsubscriptions%2F685c4662-53d8-40f9-ac51-926097ede041%2FresourceGroups%2Fnuget-prod-0-v2gallery%2Fproviders%2Fmicrosoft.insights%2Fcomponents%2Fnuget-prod-v2gallery/source/LogsBlade.AnalyticsShareLinkToQuery/q/H4sIAAAAAAAAA2XOsQ6CQAwG4N2naJhgMawOOKgxcdAQ9QXq0cAl3B1pi6jx4T0wwcG1%252Ffv1N71ocEdStkYWbxgaYgK1jkTRdbAGrEO6yqtsXnp0BEUByYbDIMQXQjZNiTUlMIfM5O6i48UGL8uDlEx3S8M3PgF7bIWSeEMPJV%252FBKZxJ%252BlYFCtAgsZOv0z%252FpGhTbbei9jkg%252BFpPeOWT7in%252FHeZrB7fnTYoAjTwydJdMgKyw%252BnO6jLfgAAAA%253D)) * ~1.5% of "preview" nuget.org queries had no results ([LINK](https://ms.portal.azure.com#@72f988bf-86f1-41af-91ab-2d7cd011db47/blade/Microsoft_Azure_Monitoring_Logs/LogsBlade/resourceId/%2Fsubscriptions%2F685c4662-53d8-40f9-ac51-926097ede041%2FresourceGroups%2Fnuget-prod-0-v2gallery%2Fproviders%2Fmicrosoft.insights%2Fcomponents%2Fnuget-prod-v2gallery/source/LogsBlade.AnalyticsShareLinkToQuery/q/H4sIAAAAAAAAA2XOsQ6CQAwG4N2naJhgMa4OOKiLg4YoL3AeDVziXUnbEzU%252BvAcmOLi2f7%252F%252BNoqSP6Kys7J4w9AhI6jzKGp8DxswLeXrVVPMy2A8QllCtmUaBPmChm1XmRYzmEN2cvfJCeIoyPIgFePd4fCNT0DNEbN0gg%252FF0MCJzijxpgIlKEmqFNr8D6pJzW1HMehorMZeEr037F7p7TjPC7g%252Bf1oKcOKRoXdoO8MKiw%252BXxYnD9wAAAA%253D%253D)) Top queries with no results ([LINK](https://ms.portal.azure.com#@72f988bf-86f1-41af-91ab-2d7cd011db47/blade/Microsoft_Azure_Monitoring_Logs/LogsBlade/resourceId/%2Fsubscriptions%2F685c4662-53d8-40f9-ac51-926097ede041%2FresourceGroups%2Fnuget-prod-0-v2gallery%2Fproviders%2Fmicrosoft.insights%2Fcomponents%2Fnuget-prod-v2gallery/source/LogsBlade.AnalyticsShareLinkToQuery/q/H4sIAAAAAAAAA3WOMQvCMBCF9%252F6Ko1O7SFeHOqiLg1Awu8T0aAMmJ3eJRfHHm7TQRVzvfe%252B7Z6IEcmcMbI0UH5hGZIRgHUrQ7gE70ANV26av19Brh9C2UO6ZJkG%252BoGYzdnrAElbIzN5j8nix5GVzko7xaXFa8FmgOGL5v6Io6PuBog%252BZbhIo0TnN9p3gfK5quL0gkKT1fqh%252BBMsrhezyeuIeORfm7hV6FFN8Af467rkAAQAA)). These seem to be mainly misspellings: * "fluentassertion" (likely supposed to be "fluentassertions", this matches with new search) * "nopi" (likely supposed to be "npoi") * "npsql" (likely supposed to be "npgsql") * "newtosof" * "topself" (likely supposed to be "topshelf") * "xuint" (likely supposed to be "xunit") * "doxigen" (likely supposed to "doxygen") * "dopper" (likely supposed to be "dapper") * "weboscket" * "ecxel" * "ffmpge" (likely supposed to be "ffmpeg") * "entitiframework" * "Newtownseoft" * "watsapp" * "bootrab" * "xamarian" * "promethues" * "redish" * "EPPLush" (likely supposed to be "EPPlus") * "newtonsfot" * "tupple" The following would've matched with prefix matching: * "octoki" * "nrule" * "whatsap" This data seems to indicate that we can improve 1.5% of queries by supporting fuzzy matching on single term queries. Eg, map queries with a single unscoped term like `octoki` to something like `octoki~1`.
loic-sharma commented 4 years ago

Ideas

Implementations

Add optional prefix matching to raw terms

The query foo.bar results in Azure Search text like foo.bar (+foo +bar)^3 tokenizedPackageId:foo.bar* packageId:foo.bar^1000

Implementation: dev..loshar-prefix-v3

Data set Control NDCG score Treatment NDCG score Result
nuget.org curated queries 0.8527 0.8481
client curated queries 0.7619 0.7985
feedback queries 0.7389 0.7712
Results Curated Search Queries ====================== Control: 0.852724537661141 Treatment: 0.848133596436223 Biggest Winners (10) -------------------- mysql => +0.0012 identityserver => +0.0003 powershell => +0.0003 CefSharp => +0.0002 aws => +0.0002 RTF => +0.0001 Oracle => +0.0001 svg => +0.0001 sql => +0.0001 sftp => 0 Biggest Losers (10) ------------------- excel => -0.0017 kafka => -0.0007 unit testing => -0.0006 swagger => -0.0005 soap => -0.0005 email => -0.0005 websocket => -0.0003 postgres => -0.0002 Excel => -0.0002 log => -0.0002 Lowest Treatment Scores (10) ---------------------------- system.web.mvc => 0 System.Web.Mvc => 0 system.windows.forms => 0 system.web.http => 0 email => 0 System.Web => 0 Tags:"seq-app" => 0 Tags:"excel-to-pdf" => 0 socket => 0 httpcontext => 0 Client Curated Search Queries ============================= Control: 0.761907563066675 Treatment: 0.79854464949181 Biggest Winners (10) -------------------- mi => +0.0044 mic => +0.0044 enti => +0.0037 Mic => +0.0033 swa => +0.0028 entit => +0.0028 micros => +0.0026 Mi => +0.0026 sele => +0.0022 micr => +0.0020 Biggest Losers (6) ------------------ swagger => -0.0010 log => -0.0008 excel => -0.0008 ef => -0.0006 moq => -0.0003 Moq => -0.0001 Lowest Treatment Scores (10) ---------------------------- micro => 0 m => 0 e => 0 n => 0 Micro => 0 ent => 0 System.Web => 0 a => 0 M => 0 N => 0 Feedback ======== Control: 0.738874919537865 Treatment: 0.771155004154323 Biggest Winners (10) -------------------- appsf => +0.0117 mpart => +0.0085 MS, .net core => +0.0039 Azure => +0.0024 brot => +0.0024 clrheap => +0.0024 eventgri => +0.0024 fluentassertion => +0.0024 Newt => +0.0024 systemtesting => +0.0024 Biggest Losers (10) ------------------- log => -0.0020 csharp => -0.0014 ef => -0.0014 Ef => -0.0014 extended xml => -0.0014 postgres => -0.0014 Postgres => -0.0014 ilmerge => -0.0009 Azure PCL => -0.0006 Log => -0.0006 Lowest Treatment Scores (10) ---------------------------- excel => 0 Ocr => 0 markdown => 0 localization provider core => 0 Material => 0 log file => 0 log => 0 logging => 0 reading excel => 0 Azure. => 0

Add optional prefix matching to the last raw term

The query foo bar results in Azure Search text like foo bar (+foo +bar)^3 tokenizedPackageId:bar*

Implementation: dev..loshar-prefix-v5

Data set Control NDCG score Treatment NDCG score Result
nuget.org curated queries 0.8482 0.8455
client curated queries 0.7605 0.7934
feedback queries 0.7184 0.7470
Results Curated Search Queries ====================== Control: 0.848219210816872 Treatment: 0.845457506952199 Biggest Winners (10) -------------------- entityframework => +0.0008 Microsoft.AspNet.WebApi.Client => +0.0006 identityserver => +0.0004 powershell => +0.0003 freesql => +0.0003 ClosedXML => +0.0002 commandline => +0.0002 mongodb => +0.0002 itext7 => +0.0002 CefSharp => +0.0002 Biggest Losers (10) ------------------- excel => -0.0017 kafka => -0.0007 Dapper.Common => -0.0005 swagger => -0.0005 soap => -0.0005 postgres => -0.0004 websocket => -0.0003 email => -0.0003 Excel => -0.0002 log => -0.0002 Lowest Treatment Scores (10) ---------------------------- system.web.mvc => 0 System.Web.Mvc => 0 system.windows.forms => 0 system.web.http => 0 email => 0 System.Web => 0 Tags:"seq-app" => 0 Tags:"excel-to-pdf" => 0 socket => 0 httpcontext => 0 Client Curated Search Queries ============================= Control: 0.760465963055876 Treatment: 0.79339754812459 Biggest Winners (10) -------------------- mic => +0.0044 enti => +0.0037 Mic => +0.0033 swa => +0.0028 entit => +0.0028 micros => +0.0026 sele => +0.0022 mi => +0.0021 micr => +0.0020 newt => +0.0018 Biggest Losers (9) ------------------ swagger => -0.0009 log => -0.0008 excel => -0.0008 ef => -0.0007 grpc => -0.0001 identity => -0.0001 itext => 0 Identity => 0 Entity Framework => 0 Lowest Treatment Scores (10) ---------------------------- micro => 0 m => 0 e => 0 n => 0 Micro => 0 ent => 0 System.Web => 0 a => 0 M => 0 N => 0 Feedback ======== Control: 0.7183626191149 Treatment: 0.746955172419361 Biggest Winners (10) -------------------- appsf => +0.0117 mpart => +0.0085 MS, .net core => +0.0039 brot => +0.0024 clrheap => +0.0024 eventgri => +0.0024 fluentassertion => +0.0024 logging => +0.0024 mysql => +0.0024 Newt => +0.0024 Biggest Losers (10) ------------------- extended xml => -0.0024 sqlite => -0.0024 ef => -0.0019 Ef => -0.0019 postgres => -0.0018 Postgres => -0.0018 csharp => -0.0014 ilmerge => -0.0009 log => -0.0006 Log => -0.0006 Lowest Treatment Scores (10) ---------------------------- log file => 0 log file => 0 compilation => 0 Log => 0 XmlSerializer => 0 pdf => 0 alexa => 0 markdown => 0 microsoft.bot.con => 0 sqlite => 0

Add optional prefix matching to the last raw term on a new field

The query foo bar results in Azure Search text like foo bar (+foo +bar)^3 packageIdPrefixes:bar*

Implementation: dev..loshar-prefix-v6

Data set Control NDCG score Treatment NDCG score Result
nuget.org curated queries 0.8517 0.8477
client curated queries 0.7617 0.7761
feedback queries 0.7364 0.7659
Results Curated Search Queries ====================== Control: 0.851685495910323 Treatment: 0.847733067545404 Biggest Winners (10) -------------------- entityframework => +0.0007 powershell => +0.0003 itext7 => +0.0002 CefSharp => +0.0002 RTF => +0.0001 identityserver => +0.0001 svg => +0.0001 sftp => 0 dbf => 0 TeamFoundation => 0 Biggest Losers (10) ------------------- excel => -0.0017 kafka => -0.0007 swagger => -0.0005 soap => -0.0005 email => -0.0003 postgres => -0.0003 Excel => -0.0002 log => -0.0002 TWAIN => -0.0002 7zip => -0.0002 Lowest Treatment Scores (10) ---------------------------- system.web.mvc => 0 System.Web.Mvc => 0 system.windows.forms => 0 system.web.http => 0 email => 0 System.Web => 0 Tags:"seq-app" => 0 Tags:"excel-to-pdf" => 0 socket => 0 httpcontext => 0 Client Curated Search Queries ============================= Control: 0.761675281241101 Treatment: 0.776107126344952 Biggest Winners (10) -------------------- enti => +0.0037 swa => +0.0028 entit => +0.0028 sele => +0.0022 newt => +0.0017 swash => +0.0012 entityframework => +0.0012 microso => +0.0009 mstest => +0.0004 fluent => +0.0004 Biggest Losers (4) ------------------ swagger => -0.0009 log => -0.0008 excel => -0.0008 ef => -0.0007 Lowest Treatment Scores (10) ---------------------------- mi => 0 micro => 0 m => 0 en => 0 e => 0 mic => 0 n => 0 Micro => 0 ent => 0 Mic => 0 Feedback ======== Control: 0.736396135953207 Treatment: 0.765906241061337 Biggest Winners (10) -------------------- Microsoft.Extensions => +0.0142 appsf => +0.0117 mpart => +0.0085 MS, .net core => +0.0039 brot => +0.0024 clrheap => +0.0024 eventgri => +0.0024 fluentassertion => +0.0024 Newt => +0.0024 systemtesting => +0.0024 Biggest Losers (10) ------------------- extended xml => -0.0024 sqlite => -0.0024 ef => -0.0019 Ef => -0.0019 csharp => -0.0014 postgres => -0.0014 Postgres => -0.0014 unit test => -0.0012 logging => -0.0009 ilmerge => -0.0009 Lowest Treatment Scores (10) ---------------------------- System.IO.Abstractions.Test => 0 logging => 0 log file => 0 audit => 0 XmlSerializer => 0 microsoft.bot.con => 0 analyzers => 0 System.IO.Abstractions.Test* => 0 aspnetcore => 0 sqlite => 0

Add required prefix matching to tokenized terms

The query foo.bar results in Azure Search text like foo.bar (+foo +bar)^3 +tokenizedPackageId:foo* +tokenizedPackageId:bar* packageId:foo.bar^1000

Implementation: dev..loshar-prefix-v1

Data set Control NDCG score Treatment NDCG score Result
nuget.org curated queries 0.8527 0.8047
client curated queries 0.7619 0.7864
feedback queries 0.7389 0.7479
Results Curated Search Queries ====================== Control: 0.852724537661141 Treatment: 0.8047550861365 Biggest Winners (10) -------------------- system.windows.forms => +0.0013 mysql => +0.0012 CefSharp => +0.0008 identityserver => +0.0005 Microsoft.Extensions.Hosting => +0.0005 Microsoft.AspNet.WebPages => +0.0005 Microsoft.EntityFrameworkCore.Sqlite => +0.0004 system.web.http => +0.0004 System.Web.Http => +0.0003 powershell => +0.0003 Biggest Losers (10) ------------------- windowsazureofficial => -0.0074 Microsoft.VisualStudio.Web.CodeGeneration.Design => -0.0025 System.Net.Http.Formatting => -0.0025 json.net => -0.0024 Microsoft.EntityFrameworkCore.SqlServer => -0.0023 Owner:"Autofac" Autofac* => -0.0020 accord.net => -0.0019 excel => -0.0017 Microsoft.EntityFrameworkCore.Tools => -0.0014 unit testing => -0.0014 Lowest Treatment Scores (10) ---------------------------- windowsazureofficial => 0 json.net => 0 unit testing => 0 system.web.mvc => 0 System.Web.Mvc => 0 Owner:"Autofac" Autofac* => 0 Appeon => 0 email => 0 System.Web => 0 XLSX => 0 Client Curated Search Queries ============================= Control: 0.761907563066674 Treatment: 0.78644417332485 Biggest Winners (10) -------------------- mi => +0.0044 mic => +0.0044 enti => +0.0037 Mic => +0.0033 swa => +0.0028 entit => +0.0028 micros => +0.0026 Mi => +0.0026 sele => +0.0022 micr => +0.0020 Biggest Losers (10) ------------------- Microsoft.EntityFrameworkCore.SqlServer => -0.0028 Microsoft.CodeAnalysis.FxCopAnalyzers => -0.0026 Microsoft.AspNetCore.Mvc.NewtonsoftJson => -0.0015 Microsoft.EntityFrameworkCore.Tools => -0.0013 System.Net.Http.Formatting => -0.0012 json.net => -0.0012 swagger => -0.0010 ef => -0.0009 log => -0.0008 System.Configuration.ConfigurationManager => -0.0008 Lowest Treatment Scores (10) ---------------------------- micro => 0 m => 0 e => 0 n => 0 Micro => 0 ent => 0 System.Web => 0 a => 0 M => 0 N => 0 Feedback ======== Control: 0.738874919537865 Treatment: 0.747874359303054 Biggest Winners (10) -------------------- appsf => +0.0117 mpart => +0.0085 microsoft.visualstudio.services => +0.0037 brot => +0.0024 Build.Extensions => +0.0024 clrheap => +0.0024 eventgri => +0.0024 fluentassertion => +0.0024 localization provider core => +0.0024 Microsoft.Azure.functions.extension => +0.0024 Biggest Losers (10) ------------------- MS, .net core => -0.0108 aad => -0.0024 ADAL => -0.0024 Approximate Nearest Neighbors => -0.0024 Azure => -0.0024 azure active directory => -0.0024 blob storage => -0.0024 cache => -0.0024 ef => -0.0024 Ef => -0.0024 Lowest Treatment Scores (10) ---------------------------- json.net => 0 json.net => 0 KeyVault => 0 system.web => 0 mocking => 0 common helpers => 0 ADAL => 0 Azure. => 0 analyzers => 0 nhib => 0

Add optional prefix matching to tokenized terms

The query foo.bar results in Azure Search text like foo.bar (+foo +bar)^3 tokenizedPackageId:foo* tokenizedPackageId:bar* packageId:foo.bar^1000

Implementation: dev..loshar-prefix-v2

Data set Control NDCG score Treatment NDCG score Result
nuget.org curated queries 0.8527 0.8498
client curated queries 0.7619 0.8012
feedback queries 0.7389 0.7719
Results Curated Search Queries ====================== Control: 0.852724537661141 Treatment: 0.849767820447313 Biggest Winners (10) -------------------- mysql => +0.0012 Microsoft.Extensions.Configuration.Json => +0.0008 Microsoft.AspNet.WebApi.Client => +0.0007 Microsoft.AspNet.WebApi.Core => +0.0005 Microsoft.AspNetCore.Razor.Design => +0.0004 identityserver => +0.0003 powershell => +0.0003 Microsoft.Office.Interop.Excel => +0.0002 Microsoft.Extensions.Caching.Memory => +0.0002 Microsoft.IdentityModel => +0.0002 Biggest Losers (10) ------------------- excel => -0.0017 kafka => -0.0007 unit testing => -0.0006 swagger => -0.0005 System.Net.Http.Formatting => -0.0005 soap => -0.0005 email => -0.0005 iTextSharp => -0.0004 websocket => -0.0003 Microsoft.AspNetCore.Mvc.NewtonsoftJson => -0.0003 Lowest Treatment Scores (10) ---------------------------- system.web.mvc => 0 System.Web.Mvc => 0 system.windows.forms => 0 system.web.http => 0 email => 0 System.Web => 0 Tags:"seq-app" => 0 Tags:"excel-to-pdf" => 0 socket => 0 httpcontext => 0 Client Curated Search Queries ============================= Control: 0.761907563066674 Treatment: 0.801172658997897 Biggest Winners (10) -------------------- mi => +0.0044 mic => +0.0044 enti => +0.0037 Mic => +0.0033 swa => +0.0028 entit => +0.0028 micros => +0.0026 Mi => +0.0026 sele => +0.0022 micr => +0.0020 Biggest Losers (10) ------------------- swagger => -0.0010 log => -0.0008 excel => -0.0008 ef => -0.0006 Microsoft.AspNetCore.Mvc.NewtonsoftJson => -0.0005 moq => -0.0003 System.Configuration.ConfigurationManager => -0.0003 Microsoft.AspNetCore.Identity.EntityFrameworkCore => -0.0003 System.Net.Http.Formatting => -0.0002 Moq => -0.0001 Lowest Treatment Scores (10) ---------------------------- micro => 0 m => 0 e => 0 n => 0 Micro => 0 ent => 0 System.Web => 0 a => 0 M => 0 N => 0 Feedback ======== Control: 0.738874919537864 Treatment: 0.771863995974323 Biggest Winners (10) -------------------- appsf => +0.0117 mpart => +0.0085 MS, .net core => +0.0039 brot => +0.0024 clrheap => +0.0024 eventgri => +0.0024 fluentassertion => +0.0024 Newt => +0.0024 systemtesting => +0.0024 common mark => +0.0014 Biggest Losers (10) ------------------- Microsoft.Extensions => -0.0142 log => -0.0020 Android support => -0.0014 csharp => -0.0014 ef => -0.0014 Ef => -0.0014 extended xml => -0.0014 logging => -0.0014 postgres => -0.0014 Postgres => -0.0014 Lowest Treatment Scores (10) ---------------------------- common helpers => 0 Azure identity => 0 WebView => 0 Microsoft.Azure.functions.extension => 0 Enity => 0 Ocr => 0 reading excel => 0 pdf => 0 nuget.build.task => 0 markdown => 0

Add optional prefix matching to tokenized terms on a new field

The query foo.bar results in Azure Search text like foo.bar (+foo +bar)^3 packageIdPrefixes:foo* packageIdPrefixes:bar* packageId:foo.bar^1000

Implementation: dev..loshar-prefix-v4

Data set Control NDCG score Treatment NDCG score Result
nuget.org curated queries 0.8334 0.8292
client curated queries 0.7626 0.7861
feedback queries 0.6691 0.6946
Results Curated Search Queries ====================== Control: 0.833439992200064 Treatment: 0.829230550899836 Biggest Winners (10) -------------------- Microsoft.Extensions.Configuration.Json => +0.0007 system.windows.forms => +0.0007 Microsoft.AspNet.WebApi.Core => +0.0005 Microsoft.Extensions.Caching.Memory => +0.0004 Microsoft.Office.Interop.Excel => +0.0004 EntityFramework.SqlServer => +0.0004 powershell => +0.0003 Microsoft.IdentityModel => +0.0002 CefSharp => +0.0002 Microsoft.AspNet.WebPages => +0.0002 Biggest Losers (10) ------------------- excel => -0.0017 kafka => -0.0007 unit testing => -0.0005 System.Net.Http.Formatting => -0.0005 swagger => -0.0005 Microsoft.AspNetCore.Mvc.NewtonsoftJson => -0.0004 iTextSharp => -0.0004 soap => -0.0004 devexpress => -0.0003 postgres => -0.0003 Lowest Treatment Scores (10) ---------------------------- Microsoft.AspNetCore.App => 0 system.web.mvc => 0 System.Web.Mvc => 0 system.web.http => 0 blazor => 0 hpcsharp => 0 Appeon => 0 devexpress => 0 email => 0 System.Web => 0 Client Curated Search Queries ============================= Control: 0.762606351976277 Treatment: 0.786122661120102 Biggest Winners (10) -------------------- enti => +0.0037 mic => +0.0037 swa => +0.0028 entit => +0.0028 Mic => +0.0028 sele => +0.0022 newt => +0.0018 swash => +0.0016 micros => +0.0014 Microsoft.Extensions.Configuration.Json => +0.0009 Biggest Losers (8) ------------------ swagger => -0.0009 log => -0.0008 excel => -0.0008 Microsoft.AspNetCore.Mvc.NewtonsoftJson => -0.0006 ef => -0.0006 System.Configuration.ConfigurationManager => -0.0003 Microsoft.AspNetCore.Identity.EntityFrameworkCore => -0.0003 System.Net.Http.Formatting => -0.0002 Lowest Treatment Scores (10) ---------------------------- Microsoft.AspNetCore.App => 0 mi => 0 micro => 0 m => 0 en => 0 e => 0 n => 0 Micro => 0 ent => 0 Mi => 0 Feedback ======== Control: 0.669120890795799 Treatment: 0.694610466855647 Biggest Winners (10) -------------------- appsf => +0.0117 mpart => +0.0085 MS, .net core => +0.0039 Azure => +0.0024 brot => +0.0024 clrheap => +0.0024 eventgri => +0.0024 fluentassertion => +0.0024 Newt => +0.0024 common mark => +0.0014 Biggest Losers (10) ------------------- Microsoft.Extensions => -0.0127 ef => -0.0019 Ef => -0.0019 Android support => -0.0014 csharp => -0.0014 postgres => -0.0014 Postgres => -0.0014 unit test => -0.0014 ilmerge => -0.0009 Azure PCL => -0.0006 Lowest Treatment Scores (10) ---------------------------- GitExtensions.SVN => 0 aspnetcore => 0 microsoft.bot.con => 0 Azure identity => 0 localization provider core => 0 analyzers => 0 cosmos => 0 Material => 0 bench => 0 microsoft.extensions.hosting.windowsservice => 0
loic-sharma commented 4 years ago

These experiments are based off the "Add optional prefix matching to the last raw term" (dev..loshar-prefix-v5)

Boost prefix matches if the last raw term is shorter than 4 characters

The query ent results in Azure Search text like ent tokenizedPackageId:ent*^20

Implementation: dev..loshar-prefix-v7

Data set Control NDCG score Treatment NDCG score Result
nuget.org curated queries 0.8526 0.8499
client curated queries 0.7627 0.8217
feedback queries 0.7316 0.7556
Results Curated Search Queries ====================== Control: 0.8526305050062 Treatment: 0.849937967785616 Biggest Winners (10) -------------------- entityframework => +0.0007 aws => +0.0007 identityserver => +0.0004 xml => +0.0004 powershell => +0.0003 itext7 => +0.0002 CefSharp => +0.0002 RTF => +0.0001 svg => +0.0001 REST => +0.0001 Biggest Losers (10) ------------------- excel => -0.0017 kafka => -0.0007 swagger => -0.0005 soap => -0.0004 postgres => -0.0004 email => -0.0003 roslyn => -0.0003 Excel => -0.0002 TWAIN => -0.0002 7zip => -0.0002 Lowest Treatment Scores (10) ---------------------------- system.web.mvc => 0 System.Web.Mvc => 0 system.windows.forms => 0 system.web.http => 0 email => 0 System.Web => 0 Tags:"seq-app" => 0 Tags:"excel-to-pdf" => 0 socket => 0 httpcontext => 0 Client Curated Search Queries ============================= Control: 0.762687841979449 Treatment: 0.82173570363872 Biggest Winners (10) -------------------- m => +0.0057 mic => +0.0052 mi => +0.0044 Mic => +0.0039 enti => +0.0037 ent => +0.0033 M => +0.0029 swa => +0.0028 entit => +0.0028 micros => +0.0026 Biggest Losers (8) ------------------ swagger => -0.0009 ef => -0.0009 excel => -0.0008 net => -0.0004 nu => -0.0003 identity => -0.0001 Identity => 0 web => 0 Lowest Treatment Scores (10) ---------------------------- micro => 0 Micro => 0 System.Web => 0 boot => 0 c => 0 system.web.http => 0 se => 0 ef => 0 test => +0.0003 logging => +0.0004 Feedback ======== Control: 0.731618985150309 Treatment: 0.755634522485088 Biggest Winners (10) -------------------- appsf => +0.0117 mpart => +0.0085 MS, .net core => +0.0039 brot => +0.0024 clrheap => +0.0024 eventgri => +0.0024 fluentassertion => +0.0024 logging => +0.0024 mysql => +0.0024 Newt => +0.0024 Biggest Losers (10) ------------------- ef => -0.0024 Ef => -0.0024 extended xml => -0.0024 rx => -0.0024 sqlite => -0.0024 ORM => -0.0018 postgres => -0.0018 Postgres => -0.0018 log => -0.0011 my sample lib => -0.0009 Lowest Treatment Scores (10) ---------------------------- compilation => 0 pdf => 0 Build.Extensions => 0 commandline => 0 microsoft.aspnetcore => 0 NFC => 0 KeyVault => 0 Azure => 0 alexa => 0 Azure.Storage => 0
Alternatives Curated Search Queries Control: 0.852630505006199 Treatment @ 10: 0.849721483613432 Treatment @ 12: 0.849721483613431 Treatment @ 14: 0.849750501738614 Treatment @ 20: 0.849937967785617 Treatment @ 30: 0.849761277984439 Client Curated Search Queries Control: 0.762687841979449 Treatment @ 10: 0.819540473808362 Treatment @ 12: 0.821025560206729 Treatment @ 14: 0.820854003879769 Treatment @ 20: 0.82173570363872 Treatment @ 30: 0.819246893787486 Feedback Control: 0.731618985150309 Treatment @ 10: 0.757448021263524 Treatment @ 12: 0.757448021263524 Treatment @ 14: 0.756935537801969 Treatment @ 20: 0.755634522485087 Treatment @ 30: 0.75316228474671

These experiments are based off the "Add optional prefix matching to the last raw term" (dev..loshar-prefix-v5)

🎉Boost prefix matches on short terms. Prefix match on full package id if term contains id separators

The query ent results in Azure Search text like ent tokenizedPackageId:ent*^20. The query foo.bar results in Azure Search text like foo.bar (+foo +bar)^3 packageId:foo.bar* packageId:foo.bar^1000

Implementation: dev..loshar-prefix-v8

Data set Control NDCG score Treatment NDCG score Result
nuget.org curated queries 0.8526 0.8523
client curated queries 0.7627 0.8215
feedback queries 0.7316 0.7708
Results Curated Search Queries ====================== Control: 0.8526305050062 Treatment: 0.851433027547545 Biggest Winners (10) -------------------- entityframework => +0.0007 system.windows.forms => +0.0007 aws => +0.0007 System.ServiceModel => +0.0006 identityserver => +0.0004 xml => +0.0004 Microsoft.AspNet.WebPages => +0.0003 powershell => +0.0003 Microsoft.IdentityModel => +0.0002 itext7 => +0.0002 Biggest Losers (10) ------------------- excel => -0.0017 kafka => -0.0007 Microsoft.AspNet.WebApi.Client => -0.0006 swagger => -0.0005 soap => -0.0004 postgres => -0.0004 email => -0.0003 roslyn => -0.0003 AspNetCore.HealthChecks => -0.0003 Excel => -0.0002 Lowest Treatment Scores (10) ---------------------------- system.web.mvc => 0 System.Web.Mvc => 0 system.web.http => 0 email => 0 System.Web => 0 Tags:"seq-app" => 0 Tags:"excel-to-pdf" => 0 socket => 0 httpcontext => 0 qrcode => 0 Client Curated Search Queries ============================= Control: 0.762687841979449 Treatment: 0.821543990654414 Biggest Winners (10) -------------------- m => +0.0057 mic => +0.0052 mi => +0.0044 Mic => +0.0039 enti => +0.0037 ent => +0.0033 M => +0.0029 swa => +0.0028 entit => +0.0028 micros => +0.0026 Biggest Losers (10) ------------------- swagger => -0.0009 ef => -0.0009 excel => -0.0008 Microsoft.AspNet.WebApi.Client => -0.0006 net => -0.0004 nu => -0.0003 MySql.Data => -0.0002 identity => -0.0001 Identity => 0 web => 0 Lowest Treatment Scores (10) ---------------------------- micro => 0 Micro => 0 System.Web => 0 boot => 0 c => 0 system.web.http => 0 se => 0 ef => 0 test => +0.0003 logging => +0.0004 Feedback ======== Control: 0.731618985150309 Treatment: 0.76471068648356 Biggest Winners (10) -------------------- appsf => +0.0117 mpart => +0.0085 microsoft.visualstudio.services => +0.0057 MS, .net core => +0.0039 Azure => +0.0024 brot => +0.0024 clrheap => +0.0024 eventgri => +0.0024 fluentassertion => +0.0024 logging => +0.0024 Biggest Losers (10) ------------------- Microsoft.Extensions => -0.0142 ef => -0.0024 Ef => -0.0024 extended xml => -0.0024 rx => -0.0024 microsoft.azure => -0.0020 ORM => -0.0018 postgres => -0.0018 Postgres => -0.0018 Android support => -0.0014 Lowest Treatment Scores (10) ---------------------------- log file => 0 codeanalysis.csharp => 0 log => 0 ef => 0 Vault => 0 extended xml => 0 alexa => 0 Microsoft.Extensions.Hosting.WindowsService => 0 mariadb => 0 nsq => 0
Alternatives nuget.org curated control: 0.8526305050062 nuget.org curated treatment @ 1: 0.851433027547545 nuget.org curated treatment @ 10: 0.852301307516017 nuget.org curated treatment @ 20: 0.852301307516016 client curated control: 0.762687841979449 client curated treatment @ 1: 0.821543990654414 client curated treatment @ 10: 0.82191663050274 client curated treatment @ 20: 0.821468599623034 feedback control: 0.731618985150309 feedback treatment @ 1: 0.76471068648356 feedback treatment @ 10: 0.770140204780968 feedback treatment @ 20: 0.770798417984278
loic-sharma commented 4 years ago

Prefix match last term on full package id if term contains id separators

The query foo.bar results in Azure Search text like foo.bar (+foo +bar)^3 packageId:foo.bar*^20 packageId:foo.bar^1000

Implementation: dev..loshar-prefix-v9

Data set Control NDCG score Treatment NDCG score Result
nuget.org curated queries 0.8526 0.8550
client curated queries 0.7627 0.7624
feedback queries 0.7316 0.7468
Results Curated Search Queries ====================== Control: 0.8526305050062 Treatment: 0.854993844736601 Biggest Winners (10) -------------------- system.windows.forms => +0.0007 System.Web.Http => +0.0006 System.ServiceModel => +0.0006 Microsoft.AspNet.WebPages => +0.0003 Microsoft.IdentityModel => +0.0002 system.web.http => +0.0002 R.NET.Community => +0.0002 Microsoft.AspNet.WebHooks.Receivers => +0.0002 xamarin.forms => +0.0002 socket.io => +0.0001 Biggest Losers (5) ------------------ Microsoft.AspNet.WebApi.Client => -0.0006 AspNetCore.HealthChecks => -0.0003 MySql.Data => -0.0002 Microsoft.Practices.Unity => -0.0001 Microsoft.EntityFrameworkCore.Relational => 0 Lowest Treatment Scores (10) ---------------------------- system.web.mvc => 0 System.Web.Mvc => 0 System.Web => 0 Tags:"seq-app" => 0 Tags:"excel-to-pdf" => 0 socket => 0 httpcontext => 0 qrcode => 0 QRCode => 0 Tags:"pdfviewer" => 0 Client Curated Search Queries ============================= Control: 0.762687841979449 Treatment: 0.762420737963763 Biggest Winners (8) ------------------- System.Web.Http => +0.0005 system.data => +0.0002 system.web.http => +0.0001 Microsoft.AspNetCore.Identity => +0.0001 System.Data => +0.0001 xamarin.forms => +0.0001 Microsoft.Extensions.Http => +0.0001 Microsoft.AspNetCore.Http => 0 Biggest Losers (5) ------------------ Microsoft.AspNet.WebApi.Client => -0.0006 Microsoft.Asp => -0.0003 microsoft.asp => -0.0002 MySql.Data => -0.0002 Microsoft.AspNet => -0.0001 Lowest Treatment Scores (10) ---------------------------- mi => 0 micro => 0 m => 0 en => 0 e => 0 mic => 0 enti => 0 n => 0 Micro => 0 ent => 0 Feedback ======== Control: 0.731618985150309 Treatment: 0.7467828806495 Biggest Winners (10) -------------------- microsoft.visualstudio.services => +0.0057 microsoft.bot.con => +0.0024 Microsoft.Bot.Con => +0.0024 mysql => +0.0024 serilog => +0.0014 Microsoft.Toolkit.Wpf => +0.0010 Azure.Storage => +0.0009 markdown => +0.0009 Microsoft.Azure.functions.extension => +0.0009 Microsoft.Extensions.Depen => +0.0009 Biggest Losers (2) ------------------ sqlite => -0.0024 microsoft.azure => -0.0020 Lowest Treatment Scores (10) ---------------------------- Azure identity => 0 Build.Extensions => 0 uwp pdf => 0 bench => 0 common helpers => 0 KeyVault => 0 aspnetcore => 0 appsf => 0 eventgri => 0 WebView => 0
loic-sharma commented 4 years ago

Prefix match last term on tokenized package id if term less than X characters

The query foo.bar results in Azure Search text like foo tokenizedPackageId:foo*^20

Implementation: dev..loshar-prefix-v10

Data set Control NDCG score Treatment NDCG score Result
nuget.org curated queries 0.8526 0.8538
client curated queries 0.7627 0.8038
feedback queries 0.7316 0.7159
Results Curated Search Queries ====================== Control: 0.8526305050062 Treatment: 0.8537901780685 Biggest Winners (5) ------------------- aws => +0.0007 xml => +0.0004 RTF => +0.0001 svg => +0.0001 dbf => 0 Biggest Losers (3) ------------------ ZPL => -0.0001 git => -0.0001 wpf => 0 Lowest Treatment Scores (10) ---------------------------- system.web.mvc => 0 System.Web.Mvc => 0 system.windows.forms => 0 system.web.http => 0 System.Web => 0 Tags:"seq-app" => 0 Tags:"excel-to-pdf" => 0 socket => 0 httpcontext => 0 qrcode => 0 Client Curated Search Queries ============================= Control: 0.762687841979449 Treatment: 0.8038345932575 Biggest Winners (10) -------------------- m => +0.0057 mic => +0.0052 mi => +0.0044 Mic => +0.0039 ent => +0.0033 M => +0.0029 swa => +0.0028 Mi => +0.0026 e => +0.0022 n => +0.0016 Biggest Losers (4) ------------------ ef => -0.0009 net => -0.0004 nu => -0.0003 web => 0 Lowest Treatment Scores (10) ---------------------------- micro => 0 enti => 0 Micro => 0 entit => 0 System.Web => 0 micr => 0 micros => 0 boot => 0 c => 0 microso => 0 Feedback ======== Control: 0.731618985150309 Treatment: 0.715891729366499 Biggest Winners (7) ------------------- logging => +0.0024 sqlite => +0.0024 Android support => +0.0014 Azure PCL => +0.0004 svg => +0.0004 uwp => +0.0002 cef => +0.0001 Biggest Losers (10) ------------------- ef => -0.0024 Ef => -0.0024 extended xml => -0.0024 mysql => -0.0024 rx => -0.0024 ORM => -0.0018 unit test => -0.0010 my sample lib => -0.0009 log => -0.0006 Log => -0.0006 Lowest Treatment Scores (10) ---------------------------- markdown => 0 Azure Keyvault => 0 Microsoft.Extensions.Hosting.WindowsService => 0 Microsoft.Azure.functions.extension => 0 audit => 0 my sample lib => 0 microsoft.bot.con => 0 ServerHost => 0 NFC => 0 azure f# => 0
loic-sharma commented 4 years ago

Currently V8 (dev..loshar-prefix-v8) is the most promising prototype. I will polish the prototype, fix unit tests, and A/B test the change when I come back from vacation.

loic-sharma commented 4 years ago

A/B test at 1% @ 1/16/2020 1:30PM PST A/B test at 10% @ 1/16/2020 2:00PM PST A/B test at 50% @ 1/16/2020 3:00PM PST A/B test at 0% @ 1/28/2020 11:00AM PST

loic-sharma commented 4 years ago

Build: https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=3422395 Release: https://devdiv.visualstudio.com/DevDiv/_releaseProgress?_a=release-pipeline-progress&releaseId=564095

Build 2: https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=3426012 Release 2: https://devdiv.visualstudio.com/DevDiv/_releaseProgress?_a=release-pipeline-progress&releaseId=565597