OrchardCoreContrib / OrchardCoreContrib.PoExtractor

Extracts localizable strings from .cs, .vb, .cshtml and .liquid files into POT files
MIT License
48 stars 35 forks source link

Remove some Limitations / Pitfalls #90

Open BrunoJuchli opened 4 months ago

BrunoJuchli commented 4 months ago

Hi There

I have a proposal to improve the PoExtractor and would like to get some feedback from you whether you the proposed solution is agreeable with the project and whether you would accept a PR.

Motivation

My team discussed using this tool and we anticipate the limitations regarding name convention and msgctxt to lead to some friction:

Solution Idea

What's missing in the current implementation is, instead of just reading the AST, also resolving types. From some quick research it seems to me that for this we need a roslyn SemanticModel besides the SyntaxTree. And the SemanticModel requires a Compilation.

I've done a very dirty proof of concept: See here and here

The result of this is, that the field/property the IStringLocalizer<T> is assigned to can have any name. The part of generating a different msgctxt depending on the type parameter T is not addressed, but from what I see can be done with the information already made available in this change.

I'm not really experienced with Roslyn, so there's a few questions-marks / risks:

Having said that, I think it's not unlikely that it would only work on compiling (error-free) code (warnings would probably be fine). If that limitation is fine, maybe the PoExtractor could be simplified by removing explicit support for razor (and even liquid?) templates by opening + building the whole project. My expectation here is that this would deal with generating the classes from .razor files automatically and the tool could then work on any class and not care for whether it's razor/liquid or whatnot. I would, however, expect this to slow down processing time.

For us, both would be acceptable.

Alternative Solution

Instead of using Roslyn, and analyze source code, one could also analyze the IL with Mono.Cecil. From past experience, I would say, for the problem at hand, Mono.Cecil is simpler to work with than SyntaxTree + SemanticModel but:

hishamco commented 4 months ago

I appreciate your input @BrunoJuchli, all the feedback is welcome:

Regarding localization field names, I remember that we suggested in the past to make optional, the user can provide his/her names

deriving the msgctxt from the class the IStringLocalizer is used in creates a few scenarios where the PoExtractor uses a different msgctxt than OrchardCore.Localization will use.

Could you please elaborate?

Instead of using Roslyn, and analyze source code, one could also analyze the IL with Mono.Cecil.

I don't think it's something that can we go for unless there are some good reasons - regardless of your experience in Cecil :) -

I'm glad to collaborate with you guys to make the PO Extractor works as expected with your project(s)

hishamco commented 4 months ago

I suggest that you write a failing test so that we can start to fix the issues that you are facing

BrunoJuchli commented 4 months ago

I appreciate your input @BrunoJuchli, all the feedback is welcome:

I appreciate your quick feedback and willingness to hear what I have to say :-) Thanks!

Regarding localization field names, I remember that we suggested in the past to make optional, the user can provide his/her names

This can address the problem of the convention being in violation of the "users" coding style guide.

However, it does not address the problem where these are accidentally misnamed. These cases are likely to fall through safety nets and make it to production. In our case, our release cycle works with late-localization. That is, we first deploy a release where not all texts are localized. Localization then happens some time after (few days). That is, it's not unexpected to see "developer texts". On the translation side, we have tooling support to tell translators that there's translations missing. Of course, it doesn't pick up what isn't extracted to the PO, so probably no-one will notice these untranslated texts for quite a while. The result is:

Therefore my goal is to get rid of the conventions in the PO Extraction tool (you can still use them in your SW, it doesn't matter). Instead, it should follow the rule: If it compiles, extraction works.

deriving the msgctxt from the class the IStringLocalizer is used in creates a few scenarios where the PoExtractor uses a different msgctxt than OrchardCore.Localization will use.

Could you please elaborate?

namespace ExampleApp;

public class Foo(
    IStringLocalizer<Bar> S)
{
      public void DoSomething()
      {
           Write(
                S["this is some translatable text"];
      }
}

The po extractor will create:

msgctxt "ExampleApp.Foo"
msgid "this is some translatable text"
msgstr ""

And OrchardCore.Localization will search for:

msgctxt "ExampleApp.Bar" msgid "this is some translatable text"

Note: I originally wrote "a few scenarios". For code-examples, it's just this one. I was thinking in scenarios how you could accidentally get to this state of violating the convention and expressed myself poorly. But here's how you can accidentally achieve this scenario:

Instead of using Roslyn, and analyze source code, one could also analyze the IL with Mono.Cecil.

I don't think it's something that can we go for unless there are some good reasons - regardless of your experience in Cecil :) -

I understand and concur with your assessment.

I suggest that you write a failing test so that we can start to fix the issues that you are facing

I'll do that.

If you've got some idea/knowledge about the compilation / SemanticModel aspect I'd appreciate it very much if you could give me some feedback whether you think it would be acceptable to use it.

hishamco commented 4 months ago

In your example above, there's an issue that we are facing frequently, the generic type SHOULD be the same as the class name, we suggested creating an analyzer instead

hishamco commented 4 months ago

@sebastienros @Skrypt your thoughts

BrunoJuchli commented 4 months ago

In your example above, there's an issue that we are facing frequently, the generic type SHOULD be the same as the class name, we suggested creating an analyzer instead

Writing an analyzer is more work than removing these conventions from PO extractor.

The reason for this is, that the analyzer would use the same tools (roslyn) that PO extractor is already using. It requires the same code to determine whether the correct type is used as the PO extractor would use to get rid of the convention.

From a usability perspective, I would also argue that the analyzer is worse.

Do you agree?

hishamco commented 4 months ago

I'm writing an analyzer in the client app that uses this library.

I think we could fix the generated localization resource here but the issue no one will tell the app the types should match, right?

BrunoJuchli commented 4 months ago

I'm writing an analyzer in the client app that uses this library.

I think we could fix the generated localization resource here but the issue no one will tell the app the types should match, right?

When the PoExtractor is changed, why should the types match? It'll be up to consumer to decide whether they should match. Each developer/team can decide for their own conventions (or lack thereof). If they decide for a convention, then an analyzer would be useful --> so I see why offering an Analyzer makes sense.

Is there a reason why the convention would need to be followed by everyone using Orchard? And the PoExtractor?

(In my case, I would certainly prefer it to be opt-in, but I'm not using orchard itself).

hishamco commented 4 months ago

Even if you are not using OC if you check ASP.NET Core samples, you will notice generic types for the logger, string localizer .. etc should match the class, so this is not OC specific. I agree for naming the localizer fields they should be framework agnostic

BrunoJuchli commented 4 months ago

Even if you are not using OC if you check ASP.NET Core samples, you will notice generic types for the logger, string localizer .. etc should match the class, so this is not OC specific. I agree for naming the localizer fields they should be framework agnostic

Ah yes. I blame microsoft for establishing this practice 🙄 - a pity. I do understand that as component authors it makes sense to follow "established practice". Even though arguably there exists better practice, and so I would say it would be nice to support both ;-)


Instead of using a type parameter, pass the info into the ctor, say like:

public class Logger : ILogger
{
     public Logger(
         Type contextType)
     {
     ...
     }
}

And when injecting the ILogger into FooService, the DI container would pass typeof(FooService) as the type parameter to the Logger implementation. There's plenty of DI containers which are capable of doing so. For some examples, see here. That's less typing for developers, no chance for mistakes, and it doesn't require writing an analyzer... ;-)


What do you think about supporting both patterns? The PoExtractor could also emit an error if the type parameter T of IStringLocalizer<T> doesn't match the type of the class it's used in.

hishamco commented 4 months ago

Ah yes. I blame microsoft for establishing this practice 🙄 - a pity.

lol, could you please elaborate on why you are using a different type in the generic type?

The PoExtractor could also emit an error if the type parameter T of IStringLocalizer doesn't match the type of the class it's used in.

Could be, but I think this is not a tool responsibility, right?

BrunoJuchli commented 4 months ago

Ah yes. I blame microsoft for establishing this practice 🙄 - a pity.

lol, could you please elaborate on why you are using a different type in the generic type?

Because I make mistakes, and for our localization tooling it doesn't matter much ;-)

But, after our discussion, I understand see that there's no value in deliberately choosing a different T. And therefore, the analyzer makes sense, because it's better to fail fast in the IDE than later when executing extractpo. So thanks for being that patient with me ;-)

(Like I said, it would be even nicer if the type would be left out instead, and would be deduced reliably per convention (so, basically, what PoExtractor is doing, but not what OrchardCore.Localization is doing)).

The PoExtractor could also emit an error if the type parameter T of IStringLocalizer doesn't match the type of the class it's used in.

Could be, but I think this is not a tool responsibility, right?

I think this depends on intended usage of the tool. If you pass it invalid input, the tool should give back an error. That is:


Vision

So, to cover all cases optimally, I think it should work as follows:

Analyzer

Searches for usages of IStringLocalizer<T>. When the T doesn't match the type it's used in (for example, using IStringLocalizer<Bar> inside of Foo) the analyzer generates an error.

Optionally, it could also offer a fix-up, which would be to change the T from Bar to Foo.

ExtractPo

Generates POT files from usages of IStringLocalizer[...] and IStringLocalizer<T>[...]. The name of fields and properties the IStringLocalizer/IStringLocalizer<T> is assigned to does not matter.

The following rules would be used to define msgctxt:

Notes

The Analyzer would not necessarily be part of this Repo because the PoExtractor does not care if you follow this convention or not. Also, OrchardCore.Localization doesn't care either, so it could be something completely separate?

(sidenote: here's an anlyzer doing the same for ILogger<T>: https://github.com/PavelStefanov/MicrosoftLogger.Analyzer)

hishamco commented 4 months ago

Hope the above suggestions are fine for you