CodinGame / monaco-vscode-api

VSCode public API plugged on the monaco editor
MIT License
205 stars 29 forks source link

segmenter fallback #404

Closed BusinessDuck closed 6 days ago

BusinessDuck commented 2 months ago

Working on fix https://github.com/CodinGame/monaco-vscode-api/issues/397

CGNonofr commented 2 months ago

Can you please implement it as a commit/patch instead? that way it would be simpler to synchronize with the microsoft team

refer to https://github.com/CodinGame/monaco-vscode-api/blob/main/docs/vscode_monaco_upgrade.md on how to do it

BusinessDuck commented 2 months ago

Sure, let me a few minutes

BusinessDuck commented 2 months ago

Wow, its not really clear to me, how to do that... a lot of steps :) I can attach patch here, but it's created on the knee

diff --git a/./src/override/vs/editor/common/core/wordCharacterClassifier1.ts b/./src/override/vs/editor/common/core/wordCharacterClassifier2.ts
index a91950a..32f5aa8 100644
--- a/./src/override/vs/editor/common/core/wordCharacterClassifier1.ts
+++ b/./src/override/vs/editor/common/core/wordCharacterClassifier2.ts
@@ -16,18 +16,14 @@ export const enum WordCharacterClass {
 export class WordCharacterClassifier extends CharacterClassifier<WordCharacterClass> {

    public readonly intlSegmenterLocales: Intl.UnicodeBCP47LocaleIdentifier[];
-   private readonly _segmenter: Intl.Segmenter | null = null;
+   private readonly _segmenter: undefined | null = null;
    private _cachedLine: string | null = null;
    private _cachedSegments: IntlWordSegmentData[] = [];

    constructor(wordSeparators: string, intlSegmenterLocales: Intl.UnicodeBCP47LocaleIdentifier[]) {
        super(WordCharacterClass.Regular);
        this.intlSegmenterLocales = intlSegmenterLocales;
-       if (this.intlSegmenterLocales.length > 0) {
-           this._segmenter = new Intl.Segmenter(this.intlSegmenterLocales, { granularity: 'word' });
-       } else {
-           this._segmenter = null;
-       }
+    this._segmenter = null;

        for (let i = 0, len = wordSeparators.length; i < len; i++) {
            this.set(wordSeparators.charCodeAt(i), WordCharacterClass.WordSeparator);
@@ -93,8 +89,18 @@ export class WordCharacterClassifier extends CharacterClassifier<WordCharacterCl
    }
 }

-export interface IntlWordSegmentData extends Intl.SegmentData {
-   isWordLike: true;
+export interface IntlWordSegmentData {
+  /** A string containing the segment extracted from the original input string. */
+  segment: string
+  /** The code unit index in the original input string at which the segment begins. */
+  index: number
+  /** The complete input string that was segmented. */
+  input: string
+  /**
+   * A boolean value only if granularity is "word"; otherwise, undefined.
+   * If granularity is "word", then isWordLike is true when the segment is word-like (i.e., consists of letters/numbers/ideographs/etc.); otherwise, false.
+   */
+  isWordLike: true
 }

 const wordClassifierCache = new LRUCache<string, WordCharacterClassifier>(10);
BusinessDuck commented 2 months ago

The main idea is drop Intl.SegmentData safety for all external usages. I can try later with your guide

BusinessDuck commented 2 months ago

I am done with patch style

CGNonofr commented 2 months ago

Did you open a PR on the VSCode side?

btw, can you please cleanup the PR? (remove rollbacked commits) ?

BusinessDuck commented 2 months ago

Did you open a PR on the VSCode side?

Not yet

btw, can you please cleanup the PR? (remove rollbacked commits) ?

Done

CGNonofr commented 2 months ago

Did you open a PR on the VSCode side?

Not yet

Please do, I'd like the microsoft team opinion on it before backporting it

BusinessDuck commented 2 months ago

https://github.com/microsoft/vscode/issues/210577

CGNonofr commented 2 months ago

I've made a change to remove the footer from the patches so that all of them don't change depending on the git version, can you please rebase and use the updated command?

BusinessDuck commented 2 months ago

PR has been Updated

BusinessDuck commented 2 months ago

Workflow fixed

BusinessDuck commented 2 months ago

can you run workflow?

BusinessDuck commented 2 months ago

can we temporary merge that MR until vscode team is in progress with it? Lets create an issue to drop that patch after vscode team decision

CGNonofr commented 2 months ago

@kaisalmen @CompuIves What are your opinions on that? I'm really not sure to understand the problem enough to make a decision

kaisalmen commented 2 months ago

@CGNonofr I was busy with other things last week and likely this week as well. Without digging into it I can't provide qualified input here.

CGNonofr commented 1 month ago

If you rebase on main, I can release a pre-release version just for you if you want

ok I'll do it

CGNonofr commented 1 month ago

deployed as 5.1.2-segmenter.1

BusinessDuck commented 1 month ago

I did some test on my side of the project everything looks good. TypeScript with es2020.intl library not cause a problems anymore.

Do you cough something on demo stand? I also check it and don't see any problems, we need to change to chinese locale to check it

CGNonofr commented 1 month ago

I don't think it'll be merged into main until some feedback from the microsoft team

github-actions[bot] commented 3 weeks ago

This PR is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 6 days ago

This PR was closed because it has been inactive for 14 days since being marked as stale.