OrchardCMS / OrchardCore

Orchard Core is an open-source modular and multi-tenant application framework built with ASP.NET Core, and a content management system (CMS) built on top of that framework.
https://orchardcore.net
BSD 3-Clause "New" or "Revised" License
7.36k stars 2.37k forks source link

Taxonomies & Localization #161

Open urbanit opened 8 years ago

urbanit commented 8 years ago

The problems with localization we face with Orchard (and tried to solve with this module https://github.com/urbanit/OrchardCMS.Localization-Extensions ) are:

Taxonomies with Localization Part

  1. Create a taxonomy --> Category_En
  2. Add translation --> Category_Gr
  3. Add a term in Category_En --> Term_En
  4. Add a translation --> Term_Gr Oops... without the module, Term_Gr is created, but not under Category_Gr and misses its parent... Lost

Content Items with Localization Part and Taxonomies

  1. Add Localization Part and Category Taxonomy (both Category_En and Category_Gr? - this is how it works now with module)
  2. Create a new Content Item - no language is selected yet.... As for now, no taxonomy is shown...
  3. Select Language and save... The appropriate taxonomy appears.... And now you can choose your terms...

I will update this post with how Drupal is working -- not ideally though...

jersiovic commented 8 years ago

It would be interesting to try localization in Orchard 2 avoids the path of duplicating content items which derives in other problems or at least that this one wouldn't be the only option. It is a problem when for localization reasons you have duplicated taxonomies, each one for one culture. Because if you want to offer a projection query that can filter by taxonomy field then you need as many taxonomy fields as cultures you use in the content type targeted by the query, each one pointing to the equivalent term in its related culture.. However if you where in an scenario without localization you usually would use only one taxonomy field. This extra complexity only because you want localized terms is a bit crazy.

A better path IMO will be offering localization at field level and at part level. Text fields should be able to be configured as localizable setting which cultures have to be available when their data is edited. The same with parts. Those parts localizable should fulfill a Localizable interface that will assure a set of methods to get access to its data in a localized way and its storage in the storage system. Finally Orchard 2 indexation and filters used for queries should take into account those localizable data for indexes allowing localized searches using index for a language, or combining different indexes.

I know easy to say, but a hard work ahead

sebastienros commented 8 years ago

Localization by duplicating the content is the way to go IMO, and synchronizing fields too. I have had meeting with different teams in MS and also you guys and it solves most issues. I still need to look into Taxonomies more, it's a tricky one, I wonder if the recommendation we give for menus should not be the same in terms of localization.

sebastienros commented 8 years ago

Ok, I think I understand how taxonomies should work in regard of localization. I have tried in Craft CMS and I like it.

jersiovic commented 8 years ago

I see. A question: will we have a weight for each localized term or it will be only in the master term? If there is a weight per term when you change it through UI, through recipes or programmatically using taxonomy service, will it be synchronized in localized versions too or will it has to be changed explicitly for each localized version? For flexibility reasons I see better they are not synchornized but for making easy the most common scenario I would prefer they are synchronized. So, a midpoint could be make it configurable at taxonomy level. If it will be synchronized, the desirable will be that no matter if you change the master term or a synchronized versions all of them will be updated.

Related with query projections: the taxonomy field filter for query projections would be interesting that also look up for localized versions of the term set by the user in the filter or offer a check 'Filter also by localized versions of these terms' to leave this decision to the user.

In taxonomy services it would be useful for maintaining localization logic just in one place an extra argument for the same purpose in methods like GetContentItems, GetContentItemsCount or GetContentItemsQuery. Furthermore, for other methods like GetTermsForContentItem, GetParents or GetChildrenan an extra parameter setting the culture in which you want the resulting terms will be useful to help consumer of those methods to be ignorant of the culture used by the user when selected a term in a taxonomy as input to a page.

jersiovic commented 8 years ago

Well, thinking it twice if we maintain in sync weight of localized terms why not also image fields that is very common scenario with terms or other non textual data ... so this is like opening the Pandora's box. Was this what you mean when you said "Localization by duplicating .. and synchronizing fields .. is the way to go"? This won't hit performance? I mean now for storing one content item with 4 localized versions, you will need to load 4 content items and store 4 content items because you always will need to sync non textual fields. however with a solution at field level and at part level you store everything in one content item so only store one content item. Orchard 1 was not the fastest guy importing data, I know Orchard 2 improvements will help to make it better but the proposed localization solution will make things slower for storing data. I don't have a strong opinion forany of the two possibilities because they have pros and cons but it is something to stand out.

Skrypt commented 8 years ago

I think @sebastienros was refering to cloning content items like we do now in 01. We clone the master content item by opening a "create content item" form with the data from the master content item. That way we can pre-fill fields in the form. So the same logic is applied for cloning and translating with the exception that the latter has a different culture set for the form.

jersiovic commented 8 years ago

Sync non localizable fields in localized content items is a recurrent topic pending of a decisión. https://github.com/OrchardCMS/Orchard/issues/6220 That's why I though @sebastienros was refering to how it will be faced in Orchard 2

Skrypt commented 8 years ago

Makes sense to "sync" (I would say inherit) a numeric field value if this value never changes for each iterations of it's base Content Type. But here, terms are dictionnaries of words, so their text value is normally not used to do logical operations. They are used for categorizing all kinds of items only. So, if we clone and propose a text value, I think that would be fair enough. I can't argue about the current structure performances though... Is it an actual concern ?

jersiovic commented 8 years ago

I'm not sure if I understand you. What you propose solves this common scenario?: I have a localized taxonomy and I add an Image field to its taxonomy term type, but I want the different localized versions of a term use the same images. Such that when I change the images in one culture those changes also are done in another culture.

jersiovic commented 8 years ago

After notice what I said in my last comment on this other thread discussing how weight should work on taxonomies https://github.com/OrchardCMS/Orchard/issues/7055#issuecomment-236008604, I don't think we need to sync weight when we change it through recipes or programmatically using taxonomy service. It is only a feature to implement when taxonomy is changed by the admin UI if we consider it is interesting.

Then the subset of my doubts related to sync non localizable fields are not specific of taxonomies, they are more about the localization strategy in general.

jersiovic commented 8 years ago

Related with the performance concern. Currently Orchard 1.9.2 with a multitenant configuration spends 42 minutes to update a taxonomy with 356 terms and its images. For updating 6177 products with its images is spending 48 minutes. I commit the session every 50 content items to speed up things, I also preload taxonomy fields of each 50 products to update to avoid n+1 problem of lazy load data. This is without using localization, if we were using the localization based on different content items things will be worse.

I don't know how will it be with Orchard 2 but it will be good to measure how impact the selected solution for localization on performance importing a big number of content items.

Skrypt commented 8 years ago

The solution suggested is cloning the parent content item. So it will also clone that image picker field and it's image associated. The values from the parent content item will be proposed. So you could also change that image if needed too ; but the parent image will be proposed. So you could want to have different images for different cultures too. The idea is not to clone the "default culture" item, but to translate from the content item you will select. Remember that in the content item list you can translate from any content item regardless of it's culture.

For Taxonomies, the only difference here is that we have 2 steps for the creation process. It is sure that if we clone the term, it will keep it into the same Taxonomy which could be wrong. So we need to be able to change this while creating / editing that Taxonomy term.

Skrypt commented 8 years ago

I'm guessing that most performance issues you are facing with 01 are related to Nhibernate. I think that YesSQL should be really better in that concern. Here you can see some number figures on that page : http://ppanyukov.github.io/2015/05/20/entity-framework-7-performance.html YesSQL is using Dapper. It's also using https://github.com/CoreyKaylor/Lightning.NET which is really performant, thought I still need to figure out the details about that one.

jersiovic commented 8 years ago

Thank you for your explanation Skrypt. What you describe is how localization in Orchard works now and how it has to fit with Taxonomies. I agree with that implementation. The point is I have concerns about that strategy combined with the "Don't vary by culture" checkbox described here https://github.com/OrchardCMS/Orchard/issues/6220 that is what I though @sebastienros could have in mind on his answer. But this discussion maybe is out of the scope of this thread focused on localization of taxonomies where the need of the "Don't vary by culture" is more rare ... maybe if I want to have a percentage of discount at term level or something like this.

So, to avoid noise in this thread I will move discussion to the other thread to know which are the plans in relation to "Don't vary by culture" checkbox.

jersiovic commented 8 years ago

In relation to performance, I have hopes that YesSql improves everything a lot. However, is good we don't guard down taking care of performance relying on a super fast storage.

sebastienros commented 8 years ago

I confirm the plan so far is to have a "don't vary by culture" for fields, so that values are copied upon publication to all localized version of the same master item. It's also better for performance than having to load another document to get the common value.

And in terms of performance, yessql solves it not because of Dapper, but because it can denormalize the content to query in order to store the data in the most optimal way it will be queried. No more joins, and use of indexed fields. And over time, when we need new queries for new features, we can just define a new indexer. If you haven't looked at how YesSql works, then there is no way you can understand what "index" means in this context, not the same as a SQL index.

Skrypt commented 8 years ago

Almost forgot about the Map/Reduce.

sebastienros commented 8 years ago

I think weights will still be there, not in the UI though, just as an order mechanism (as a double like we talked for O1), and they will have different values in each locale.

jersiovic commented 8 years ago

@sebastienros commented "... It's also better for performance than having to load another document to get the common value". However this doesn't answer my concerns. What I proposed also have that advantage cause in one content item you have all the localized and non localized fields. My concern is that when you store a localized Content Item, you will need to load all their localized versions and update all its updated versions to be sure all of them have in sync its values. In a site supporting 3 cultures: 3 reads and 3 writes to store one content item vs 1 read and 1 write with what I proposed. The point is we usually import all the localized versions content items in a recipe. So, for updating 3 languages of a content for Orchard 2 we will import 3 content items. It means: 9 reads and 9 writes vs 1 read and 1 write with what I propose because with my proposal the different localizations will come in the same content item. Sorry if I'm disturbing but I'hent read anywhere the problems of what I propose. It is a huge effort developing it in that way or there are other problems?

Thank you

sebastienros commented 8 years ago

Websites are usually more read intensive that write intensive. So in our case we need to optimize for reads. Having the data in multiple documents will make more reads to get a localized content item. Here you are optimizing the scenario for recipes, not for the most common usage of the site.

The way it could be optimized for recipes is to optimize recipes itself, for instance by having an Import event (like we have for O 1.11) which won't synchronize the localizations, or do it intelligently.

jersiovic commented 8 years ago

I still have the impression I didn't explained properly my proposal because you say "... having the data in multiple documents will make more reads" when in fact what I propose is we have data of multiple languages in one document, that's why I say with that proposal only one read and one write is needed. In order to do so we will need to provide methods at part and field level to access localized values of a content item (all those values will be within the xml document of a content item).

Many thanks for the answer. In fact, what you say in last paragraph of not synchronize the localizations on importing recipes looks a very good solution.

sebastienros commented 8 years ago

I see, you want to provide all localizations of a single content item in the same document.

Skrypt commented 8 years ago

Seems to me that the map/reduce is directly the solution to that problem no ?

Skrypt commented 8 years ago

Hey guys, sorry by the way, I caugh a flu and my brain ain't following today :disappointed:

Skrypt commented 8 years ago

@jersiovic what I understand you are suggesting is to try to use a little more "eager loading" in some places that we use "Lazy loading" with NHibernate. Which makes sense with NHibernate and 01 since it uses the repository pattern. I think 02 has a complete different approach which I still need to study to understand as much as @Jetski5822 and @sebastienros. So don't listen to me today :wink:

jersiovic commented 8 years ago

@Skrypt well this was not my intention here. My intention was to solve two problems: On the one hand to improve the extra writes thinking on an import scenario ("eager loading" usually is used for improving read operations) . On the other hand to reduce the complexity of having multiple content items of the same data.

In fact maybe your "flu thought" ;) of map/reduce doesn't solve first problem because maintain map/reduce indexes will require same extra read and write operations. But maybe still is an answer to the second problem.

I have noticed I skip an important thing when I explained the complexity problem caused by using your proposal of having different localized content items for solving localization.

I'm going to explain it better to be sure you are aware of it.

Let's say we have a site supporting two languages English and Spanish and the following two scenarios:

Scenario 1: Editing a localized content item of type Boat with a taxonomy field attached to a localized taxonomy called Boat Type. (This is the scenario I think you have in mind in general). This scenario is the one you solve when you say your taxonomy field will show only terms in the culture of the content item you are showing. It's is not bad solution cause you will have boats in Spanish with its taxonomy field pointing to a term in Spanish. And boats in English pointing to a term in English.

Scenario 2: Editing a NON LOCALIZED content item of type Boat with a taxonomy field attached to a localized taxonomy called Boat Type. This is the scenario I have in mind all the time because is the one I face in www.sayandsail.com. I didn't noticed it is different from Scenario 1, so, that's why I think I didn't sound clear. In this scenario when you edit a Boat it doesn't have a related culture and the only culture to use to decide the culture of the terms to show in the taxonomy field is the current culture. It will make that depending on the culture you use for editing a boat you will store a term in one culture or in other. This will make more complex operations with boats that involve the use of its taxonomy field because its value is uncertain. Please think on any filter or group operation based on that field.

Once you have a localized taxonomy your proposed solution forces to propagate localization part around all the content types that use it if you want to avoid problems of Scenario 1, or move to .po files taxonomy terms translation. That's why I think a localization solution based on just one content item with all the localized text within it is more flexible and will allow to simplify things allowing you to combine localized taxonomies with non localized content types without frictions.

@Skrypt As an exploratory exercise to look for ways of solving scenario 2 lucubrate with map/reduce idea is not bad: maybe an implementation of it will need we provide for localizable parts and fields a localized model that will contain a data structure that allows us to load/store localized data in a content items. And its equivalent model non localizable that will use non localizable equivalents stored in map-reduce indexes. In that way we can use same content id to get simple parts and fields in the map-reduce index related to a culture

jersiovic commented 8 years ago

To help you move more easily to the scenario 2, this is a screenshot of two sections of the dynamic form we use to give users the ability o9f addding or edit boats. image

urbanit commented 8 years ago

Hi, I am watching this thread with interest as I am one of those guys who need it. I will not talk about performance issues, not that I do not care but first is the right approach.

  1. I think localized taxonomies and terms should be different content items - that is for me out of question. Someone can hack with field localization (as we have done in some cases in order to show the right field value) but not the way to go.
  2. Adding a localized taxonomy to a content type, either it is localized or not (but it should be, I will explain later), should add all localized taxonomies, not to be added on demand.
  3. Drupal has an option of "Language Neutral" for localized conten items, maybe it's a way to add localization for content items that do not need it.
  4. Terms on creating a new content item should follow the localization of content item. That means if it is English should show En terms, if it is Spanish to show Sp terms. In case of "Language Neutral" or no localization selected yet, should show terms in default site language. (Mayme this means an async load of terms when localization is selected or changed...)
  5. In case of "language neutral", sync should be made across localized terms (if they exist in other languages)

Hope my proposal is clear. And serves all cases @jersiovic described and is not far from what @sebastienros & @Skrypt described.

jersiovic commented 8 years ago

My doubts:

  1. I think localized taxonomies and terms should be different content items - that is for me out of question. Someone can hack with field localization (as we have done in some cases in order to show the right field value) but not the way to go.

It is not the way to go when Orchard on its basis is not though for working in that way. The result are as you say hacks. But now with Orchard 2 is time to rethink anything that can be improved. If the decisión is to follow one or other the result won't be a hack cause everything will be alignit to fit smoothly. So, I would prefer we point concrete problems derived from that way of working or solutions as you provide to current way of working.

  1. Adding a localized taxonomy to a content type, either it is localized or not (but it should be, I will explain later), should add all localized taxonomies, not to be added on demand.

Do you mean the following? We define a taxonomy field in a content type. When we set a term to a content item of that type all the different localized version of the selected terms should be asigned to that field. If this is what you mean what do we have to do when a new localized term is created for a term that is assigned to the field of a content item? Do we need to look for all the content items that use the master term in a taxonomy field for adding the new one?

  1. In case of "language neutral", sync should be made across localized terms (if they exist in other languages)

Content items with language neutral won't have localized versions so no need of sync. I don't get this point?

Your proposal provides a solution to my problem. But I expect you agree it is based on adding more patches to the spiral of syncing because of the duplicated content. That's why I cannot avoid of thinking that fixing the root of the problem will result in a system with less patches where things fit smoothly. I'm sure that I could be wrong, but I would like you show me the problems of the proposal.

Skrypt commented 8 years ago

Maybe it's time to set a list of use cases and see which scenarios both solutions would cover. Having permissions on each content items might be one scenario where eager loading might be a problem.

urbanit commented 8 years ago

@jersiovic it is not about right or wrong! :-) We just share ideas and thoughts.

So, let's make a practice:

Default Language: En Additional Languages: Es

Taxonomy: MyTaxonomy Localization Part: Yes ==> MyTaxonomy[En] & MyTaxonomy[Es]

MyTaxonomy[En] has terms: TermA_En, TermB_En, TermC MyTaxonomy[Es] has terms: TermA_Es, TermB_Es, TermD

Content Type: MyContentType

MyContentType has attached MyTaxonomy

What I propose is that attaching MyTaxonomy[En] should attach MyTaxonomy[Es]

So creating a MyContentItem of MyConentType the following may be happen:

  1. MyContentType has localized part a. If no culture is selected or is "Language Neutral", [En] terms should be loaded because of default language. b. If a culture is selected, async load of corresponding terms... [I assume permissions will be checked]
  2. MyContentType has not localized part a. [En] terms should be loaded because of default language b. If TermA_En is selected, after saving TermA_Es should be added as well. TermD cannot be selected.

Your scenario is: Default Language: En Additional Languages: Es

Taxonomy: MyTaxonomy Localization Part: Yes (only on Title for example) ==> MyTaxonomy with 2 titles:MyTaxonomy[En] & MyTaxonomy[Es]

MyTaxonomy has terms: TermA, TermB, TermC Localization Part: Yes (only on Title for example): TermA ==> has 2 titles TermA_En and TermA_Es

Content Type: MyContentType MyContentType has attached MyTaxonomy

  1. MyContentType has localized part, then all terms are shown but a. If no culture is selected or is "Language Neutral", all terms with [En] title b. If a culture is selected, all terms with corresponding language title
  2. MyContentType has not localized part a. all terms with [En] title

Right?

jersiovic commented 8 years ago

Yes @Skrypt you are right permissions are at content item level, so putting all the languages in a content item doesn't allow a fine grain permissions at language level. At least without a kind of hack. For me this is not a problem, but maybe for others it is. For those a possible solution will be have two content items with different permissions each one and without any kind of sync between both. Related to that, the question is if really has sense someone want to limit the permissions to access to a content depending on the language used by the user.

jersiovic commented 8 years ago

@jersiovic it is not about right or wrong! :-) We just share ideas and thoughts.

Yes @urbanit and I enjoy it :)

So, let's make a practice:

Default Language: En

Additional Languages: Es

Taxonomy: MyTaxonomy Localization Part: Yes ==> MyTaxonomy[En] & MyTaxonomy[Es]

MyTaxonomy[En] has terms: TermA_En, TermB_En, TermC MyTaxonomy[Es] has terms: TermA_Es, TermB_Es, TermD •TermC and TermD are not supposed to have translation, so exist only under its father

Content Type: MyContentType

MyContentType has attached MyTaxonomy

What I propose is that attaching MyTaxonomy[En] should attach MyTaxonomy[Es]

So creating a MyContentItem of MyConentType the following may be happen: 1.MyContentType has localized part a. If no culture is selected or is "Language Neutral", [En] terms should be loaded because of default language.

I don't agree at this point. User expect to see terms on its language. You should show him on its language because the page is not related to any language, so you should show him [En] or [Es] terms depending on the language selected by the user for interacting with Orchard. If the user selected Spanish to see Orchard content he won't understand why we show all in Spanish but taxonomy terms in English when editing a content item with no language associated.

b. If a culture is selected, async load of corresponding terms... [I assume permissions will be checked] 2.MyContentType has not localized part a. [En] terms should be loaded because of default language b. If TermA_En is selected, after saving TermA_Es should be added as well. TermD cannot be selected.

As I said before IMO you should show terms in the language selected by the user. But forgeting that detail, yes I understand you want to add terms in other cultures available. Problem comes, as I mentioned on a previous comment, when you add a new culture, for example French. Then you add TermA_Fr. At this point Orchard should update all existing content items with TermA_En for adding TermA_Fr. If you have bad luck and your term is used in 2000 content items, the save action after you save TermA_Fr is going to take a lot of time.

Your scenario is: Default Language: En Additional Languages: Es

Taxonomy: MyTaxonomy Localization Part: Yes (only on Title for example) ==> MyTaxonomy with 2 titles:MyTaxonomy[En] & MyTaxonomy[Es]

MyTaxonomy has terms: TermA, TermB, TermC Localization Part: Yes (only on Title for example): TermA ==> has 2 titles TermA_En and TermA_Es

Content Type: MyContentType MyContentType has attached MyTaxonomy

1.MyContentType has localized part, then all terms are shown but a. If no culture is selected or is "Language Neutral", all terms with [En] title

What I said previously I think it should show terms in the culture of the user.

b. If a culture is selected, all terms with corresponding language title

2.MyContentType has not localized part a. all terms with [En] title

Same as previous comment

Skrypt commented 8 years ago

Related to that, the question is if really has sense someone want to limit the permissions to access to a content depending on the language used by the user.

fireshot capture 4 - orchard - edit a_ - http___localhost_30321_orchardlocal_admin_contents_edit_133

Not necessarly related to the user language but related to the role group he's in since we could set permissions to "edit own" only. That user would create a "En-us" content item and no one else than him could create a "Fr-fr" version of that content item.

jersiovic commented 8 years ago

I see, you are right. We would need a kind of hack if we want to control that. This other proposal has also its cons.

But one question, currently how it works? If my role group has "edit own" permission only and I'm not the autor of the master content item, can I create a dependent localized item? Or how I'm not the owner I cannot create a localized version? If current behavior of Orchard is no other user than the owner can create a localized version then same problem now and no one has missed that feature because only owner of master content item could add or edit its localizated content items.

I will check it

jersiovic commented 8 years ago

I've checked it in Orchard 1. If you have "edit own" permission only you cannot create add a localized version of a content item you don't own. And if you are the owner you can add a localized version that you will own. So, behavior will be the same.

Well, not exactly the same, in Orchard 1 administrator can change the owner of a Content Item so it could set an owner to a localized version in one language and a different owner for other localized version.

jersiovic commented 8 years ago

After yesterday demo from Sebastien showing us Craft CMS I haven't been patient to wait for the second part of his demo on next meeting where he will explain how Localization works on this CMS. I took a look on the help of this CMS and I really liked what I saw.

;) Sorry for the spoiler @sebastienros

jptissot commented 5 years ago

Can you guys test taxonomies with the ContentLocalization part that was just merged in dev and give feedback?

I quickly tested it and found an issue with the Taxonomy field.

Steps:

  1. Create new tenant with Blog recipe and Enable Taxonomy and ContentLocalization modules
  2. Attach LocalizationPart to Taxonomy type
  3. Create a Taxonomy called Article Type and create two terms
  4. Create french version of the Taxonomy and translate terms
  5. Modify the Article content type to add a Taxonomy field.
  6. I was only able to select a single Taxonomy in the field settings but my English and French taxonomies show up as two different taxonomies.

Now, I am not sure I did this right as I never used taxonomies before in an application. Maybe I should have attached the LocalizationPart to the Taxonomy Term type instead. However, if I do this, I am not able to change the order / hierarchy per locale.

One way to fix this would be to modify the Taxonomy field to group the Taxonomies that have the localization part attached to them. And when localizing the ContentItem that references the Taxonomy field, lookup the correct locale for the taxonomy and display it.

urbanit commented 5 years ago

@jptissot does this help? (from O1) https://github.com/OrchardCMS/Orchard/issues/7352#issuecomment-259441866

nkev commented 5 years ago

@jptissot Did you find out how to localize taxanomies the official way? Still no official documentation on this as far as I can see. I am also getting confused with taxanomies in general, even looking at other (old) issue posts. A simple step-by-step example would really help everyone.