StepfenShawn / Cantonese

粤语編程語言.The Cantonese programming language.
https://cantonese-community.github.io/
MIT License
1.15k stars 46 forks source link

Allow both simplified and traditional keyword characters simultaneously #56

Open dosentmatter opened 2 years ago

dosentmatter commented 2 years ago

Instead of using the -use_tr flag to switch modes, is it possible to allow both simplified and traditional keywords at the same time?

nobodxbodon commented 2 years ago

是为了支持不同代码文件中分别使用简繁关键词么? 技术上应该没障碍吧。

dosentmatter commented 2 years ago

@nobodxbodon It is to support the use of both traditional and simplified characters in the same piece of code.

I see that @StepfenShawn changed self.match to support matching a list of strings.

This would allow either traditional or simplified. For example, "点样先" or "點樣先" would work, but not a mix of the two eg. "点樣先". Should we support mixing them? In order to that, we can interleave the traditional and simplified characters to build a regular expression to match tokens:

>>> import re

>>> re.fullmatch(r'[点點][样樣]先', '點样先')
<re.Match object; span=(0, 3), match='點样先'>
nobodxbodon commented 2 years ago

如果支持同一代码中不需一致的简繁混合关键词,是否会让用户误认为标识符也可以呢?

@StepfenShawn 祝虎年快乐,捞乜都掂!

dosentmatter commented 2 years ago

Hmm, I guess allowing the mixing of traditional and simplified characters can lead to confusion. But I'm not sure if that would cause users to mistake them for identifiers since variables, expressions, and function calls are surrounded by vertical bars, ||. Also, function declarations are prefixed with a dollar, $, symbol.

It's probably not an important feature since most people won't be mixing traditional and simplified in the same file. It might happen in a project with multiple teammates, but then you could just enforce one character set or have a converter to get them to be consistent.

Allowing both character sets would also make searching through code a pain.

Happy New Year!

StepfenShawn commented 2 years ago

Hi @nobodxbodon @dosentmatter I found that if simplified and traditional keyword characters are allowed to be used simultaneously, only three keywords need to be modified: 还数, 来睇下 and 点样先. So we just need to make small changes. And I think most users may choose to install plug-ins like vscode to write code. The Keywords will be highlighted, so we don't need to worry about users mistaking identifiers.

虎年快乐!

nobodxbodon commented 2 years ago

好像有点误会。之前的意思是,如果允许关键词的简繁混用,是否也应允许标识符的简繁混用呢?比如:

> 讲嘢 压岁钱 系 10
> 畀我睇下 压岁錢 點样先    // 仅“錢”改用了繁体
dosentmatter commented 2 years ago

@nobodxbodon, oh I did misunderstand.

是否会让用户误认为标识符也可以呢?

I'm not sure if allowing leniency in traditional/simplified keywords would lead to users to mistakenly believe identifiers can also be a mix of traditional/simplified characters.

是否也应允许标识符的简繁混用呢?

The keywords are decided by the cantonese programming language, while the identifiers are chosen by the user, so it might be better to stick with the exact same characters they declared the identifier with.

If we did allow mixing traditional/simplified characters for identifiers, it might complicate code generation to other languages, since we would have to identify and unify all of the identifiers before they can run in the other language.

My feature request for allowing leniency in the keywords wasn't proposed because I expected users to mix-and-match traditional/simplified all the time. It was just to allow leniency and to have one less option, -use_tr. I expect most users will stick to either all traditional or all simplified.