Open bjarnef opened 4 years ago
That's a pretty strange issue! Sounds like something to do with Thread culture or something along those lines.
Are you able to provide me with the full code that creates this query ... or better yet a slimmed down version of a way that i can try to replicate ?
Very strange indeed! 🦒🐰
Sure, here is they entire controller:
public class ProductSurfaceController : SurfaceController
{
private readonly IExamineManager _examineManager;
public ProductSurfaceController(IExamineManager examineManager)
{
_examineManager = examineManager;
}
[ChildActionOnly]
public ActionResult FeaturedProducts()
{
var featuredProducts = CurrentPage.GetHomePage()
.FeaturedProducts.OfType<ProductPage>();
return PartialView("ProductList", featuredProducts);
}
[ChildActionOnly]
public ActionResult ProductListByCollection(int collectionId, int p = 1, int ps = 12)
{
return PartialView("PagedProductList", GetPagedProducts(collectionId, null, p, ps));
}
[ChildActionOnly]
public ActionResult ProductListByCategory(string category, int p = 1, int ps = 12)
{
return PartialView("PagedProductList", GetPagedProducts(null, category, p, ps));
}
private PagedResult<ProductPage> GetPagedProducts(int? collectionId, string category, int page, int pageSize)
{
if (_examineManager.TryGetIndex("ExternalIndex", out var index))
{
var searcher = index.GetSearcher();
var query = searcher.CreateQuery()
.Field("__NodeTypeAlias", ProductPage.ModelTypeAlias);
if (collectionId.HasValue)
{
query = query.And().Field("parentID", collectionId.Value);
}
if (!category.IsNullOrWhiteSpace())
{
query = query.And().Field("categoryAliases", category);
}
var results = query.OrderBy(new SortableField("name", SortType.String)).Execute(pageSize * page);
var totalResults = results.TotalItemCount;
var pagedResults = results.Skip(pageSize * (page - 1));
return new PagedResult<ProductPage>(totalResults, page, pageSize)
{
Items = pagedResults.Select(x => UmbracoContext.Content.GetById(int.Parse(x.Id))).OfType<ProductPage>()
};
}
return new PagedResult<ProductPage>(0, page, pageSize);
}
}
This is more or less a copy from Vendr demo store with a few tweaks. https://github.com/vendrhub/vendr-demo-store/blob/main/src/Vendr.DemoStore/Web/Controllers/ProductSurfaceController.cs
I think you can reproduce the issue demo store, which you can clone from here: https://github.com/vendrhub/vendr-demo-store
@Shazwazza I could reproduce the issue in Vendr demostore.
Steps to reproduce
æ
, ø
or å
(might also be relevant for other specific characters in other cultures). I chosen "Øl" = beer 🍺if (e.ValueSet.ItemType.InvariantEquals(ProductPage.ModelTypeAlias))
{
// Make sure some categories are defined
if (e.ValueSet.Values.ContainsKey("categories"))
{
// Prepare a new collection for category aliases
var categoryAliases = new List<string>();
var categoryNames = new List<string>();
// Parse the comma separated list of category UDIs
var categoryIds = e.ValueSet.GetValue("categories").ToString().Split(',').Select(GuidUdi.Parse).ToList();
// Fetch the category nodes and extract the category alias, adding it to the aliases collection
using (var ctx = _umbracoContextFactory.EnsureUmbracoContext())
{
foreach (var categoryId in categoryIds)
{
var category = ctx.UmbracoContext.Content.GetById(categoryId);
if (category != null)
{
categoryAliases.Add(category.UrlSegment);
categoryNames.Add(category.Name);
}
}
}
// If we have some aliases, add these to the lucene index in a searchable way
if (categoryAliases.Count > 0)
{
e.ValueSet.Add("categoryAliases", string.Join(" ", categoryAliases));
}
if (categoryAliases.Count > 0)
{
e.ValueSet.Add("categoryNames", string.Join(" ", categoryNames));
}
}
}
private PagedResult<ProductPage> GetPagedProducts(int? collectionId, string category, int page, int pageSize)
{
category = Request.QueryString["category"];
if (_examineManager.TryGetIndex("ExternalIndex", out var index))
{
var searcher = index.GetSearcher();
var query = searcher.CreateQuery()
.Field("__NodeTypeAlias", ProductPage.ModelTypeAlias);
if (collectionId.HasValue)
{
query = query.And().Field("parentID", collectionId.Value);
}
if (!category.IsNullOrWhiteSpace())
{
//query = query.And().Field("categoryAliases", category);
query = query.And().Field("categoryNames", category);
}
var results = query.OrderBy(new SortableField("name", SortType.String)).Execute(pageSize * page);
var totalResults = results.TotalItemCount;
var pagedResults = results.Skip(pageSize * (page - 1));
return new PagedResult<ProductPage>(totalResults, page, pageSize)
{
Items = pagedResults.Select(x => UmbracoContext.Content.GetById(int.Parse(x.Id))).OfType<ProductPage>()
};
}
return new PagedResult<ProductPage>(0, page, pageSize);
}
Compile and rebuild external index via Examine dashboard.
Set a breakpoint in GetPagedProducts()
, start debugging and navigate to this url: /products/good-and-proper/?category=øl
You should be able to see this result using the following raw lucene query:
+__NodeTypeAlias:productpage +(parentID:[1147 TO 1147]) +categoryNames:ol
Make a small change in web.config or recycle app pool from IIS.
Access the same url as before. It should now use ø
instead in raw lucene query and not return any result in frontend.
+__NodeTypeAlias:productpage +(parentID:[1147 TO 1147]) +categoryNames:øl
So is this an issue with the data that is stored in the index or the query that is being produced? There are of course 2 variations of this query:
+__NodeTypeAlias:productpage +(parentID:[1147 TO 1147]) +categoryNames:ol
and
+__NodeTypeAlias:productpage +(parentID:[1147 TO 1147]) +categoryNames:øl
That category string is coming from the string category
parameter of the GetPagedProducts
method.
What is the expectation here? which query is correct? And where is the data coming from to populate the string category
parameter?
Is the data going into the index in your DemoStoreComponent.cs
var categoryNames = new List<string>();
consistently?
I am not sure if it is an issue with the stored data in the index. But I would expect the raw lucene query to be identical before and after recycle.
Does a raw lucene query normally work using culture specific characters like Danish æ
, ø
and å
? From what I have seen this would be replaced to the following:
æ
=> ae
ø
=> o
å
=> a
When rebuilding the index, it does however seem both versions work, but after recycle only the first version with o
. (where Examine generate the query with ø
).
Yes, but the value of category
value is overwritten here for testing: category = Request.QueryString["category"];
(but you could change it where the method is called, if you want).
The data is coming from the querystring category
in the url:
/products/good-and-proper/?category=øl
In this case just to test when ø
is passed in as value an Examine generate raw lucene query with o
but with ø
after recycling app pool 😊
The data is coming from the querystring category in the url:
Yes exactly, it's not Examine changing this value from ø => o, this is the value that is just being passed to it via the query string. So I'm pretty sure the problem starts with how that is happening. It might not be an examine issue at all?
I don't understand how the querystring value should change after doing a recycle of app pool. Also it doesn't explain why it works again after rebuilding the index. So I am pretty sure it is either an issue in the Examine (lucene) query or how the data is indexed. Maybe there's of difference between rebuilding the index via Examine dashboard and how Examine rebuild the index in background on app pool recycle?
Off topic: On another Umbraco Cloud project we are using querystring to do a search using Clerk.io, but I haven't noticed this should change a term containing ø
to o
.
I can try with a static variable instead, but I don't think that would make a difference from my previous observations.
Before recycling app pool, when it works:
After recycling app pool and it doesn't work:
The value of the querystring category
is øl
in both cases, but the generated lucene query by Examine is different. 🤔🤷♂️
I have specific tried with string category = "øl";
in code as well without using querystring, but with same result as mentioned here.
@Shazwazza did you had a chance to reproduce this in e.g. Vendr demo store following these steps?
no not yet
@bjarnef I had similar issue lately, I find out that is caused by that how Lucene will load vectors to memory, apparently vectors loaded from file are different than that indexed to memory. I resolved issue by replacing Standard Analyser with simple analyser, what is not perfect but at least do a job :)
@bjarnef I actually debug that further, and I was wrong! I didnt resolve that by change of analyser, but my code contains other fixes for relevance. Issue is caused by how norms are omitted internally in lucene. code which resolve issue for me is :
public class NormalizedTextFactory : IFieldValueTypeFactory
{
public IIndexFieldValueType Create(string fieldName)
{
return new NormalizedFullTextType(fieldName, new StandardAnalyzer(Version.LUCENE_30), false);
}
}
public class NormalizedFullTextType : FullTextType
{
private readonly bool _sortable;
public NormalizedFullTextType(string fieldName, Analyzer analyzer = null, bool sortable = false) : base(fieldName, analyzer, sortable)
{
_sortable = sortable;
}
protected override void AddSingleValue(Document doc, object value)
{
if (TryConvert<string>(value, out var str))
{
var field = new Field(FieldName, str.Replace("\"",""), Field.Store.YES, Field.Index.ANALYZED);
field.OmitNorms = true;
doc.Add(field);
}
}
}
On a project on Umbraco Cloud we have a filter dropdown to filter a product list based on sizes. Some of these values contains Danish characters
æ
,ø
andå
.However we have notices that this sometimes didn't work after a deploy or rebuilding ModelsBuilder. It seems to come down to when recycling app pool as we can reproduce the issue after making a change in web.config
The project is using Umbraco v8.6.4 and Examine v1.0.5
Here is the result before recycle. Note the raw lucene query is:
After application is recycled and refreshing page where no results are returned:
When rebuilding external index from Examine dashboard it works again and generate the first lucene query.
For now we have implemented this temporary hack:
Strange enough when there are no results, but searching for "Nyfødt/50 cm" in Examine dashboard it seems to find a result and the product list returns a result in frontend without I clicking the "Rebuild index" button. Not sure if the a search is triggering a reindex or the found results or full reindex?
We have also seems similar issue when searching using
.NativeQuery()
method and usingæ
,ø
andå
in search term, but not sure if it is the same underlying issue.