Support other comment styles

ceefour commented 13 years ago

Hi Daniel,

Iam not familiar with Antlr. Itd be awesome if you could add support for other comment styles:

XML
Clojure. Single line beginning with ;
Ruby/shell. Single line beginning with #

Although it may be generified (support for exotic languages or custom uses), perhaps separate lexer for each of the most common comment styles is best considering performance.

Thank you.

Regards, Hendy

danieldietrich commented 13 years ago

Looking at your proposed languages, the common nature of comments is:

protected regions are denoted in comments (start and end of protected region)
comments have a start and an end character sequence

(This behavior implies that subsequent occurrences of comment start characters are somehow 'ignored' within comments.)

Providing a ProtectedRegionRecognizer in this way should be sufficient.

A ProtectedRegionConsumer just consumes characters (with a look ahead functionality somehow) and depends on a ProtectedRegionRecognizer which provides the needed information about the ongoing character stream. In the ProtectedRegionConsumer plays the music it has the functionality of the current DefaultDocumentWriter + the functionality of the specific Antlr parser.

interface IProtectedRegionConsumer { setProtectedRegionRecognizer(IProtectedRegionRecognizer r); IDocument parse(InputStream) throws MalformedInputException; }

interface IProtectedRegionRecognizer { java.util.Pattern getCommentStart(); // Pattern allows to differentiate by different comment styles in one language java.util.Pattern getCommentEnd(); boolean isProtectedRegionStart(String comment); boolean isProtectedRegionEnd(String comment); }

Example:

class ClojureProtectedRegionRecognizer implements IProtectedRegionRecognizer {

@Override public java.util.Pattern getCommentStart() { return Pattern.compile(";"); } @Override public java.util.Pattern getCommentEnd() { return Pattern.compile("[\n\r]"); }

private static final String ID = "([a-zA-Z$][a-zA-Z\d$].)[a-zA-Z$][a-zA-Z\d$]_"; private static final Pattern PR_START = Pattern.compile("\s_PROTECTED\s_REGION\sID\s(\s" + ID + "\s)\s_START\s*"); @Override public boolean isProtectedRegionStart(String s);

private static final Pattern PR_END = Pattern.compile("\s_PROTECTED\s_REGION\sEND\s"); @Override public boolean isProtectedRegionEnd(String s) { PR_END.matcher().matches(s); } }

Starting implementation later today!

Regards, Daniel

danieldietrich commented 13 years ago

Update: We don't have to care about Malformed input. We don't have to know the languages in particular. That's the job of the ide / editor used when writing the documents...

danieldietrich commented 13 years ago

Update: Regarding performance there are perhaps better approaches than using java.util.Pattern to recognize comments. I'll check if it is possible to modify the generated antlr parser and make it customizable :-)

ceefour commented 13 years ago

Re: performance, on second thought I think it should be lowest priority. Most important is it should be easy and straightforward to implement. I think regex is fast enough... code generation is not something that's measured by tx/sec I guess :)

danieldietrich commented 13 years ago

you're right. having some good ideas. think i can provide a first version later this day

danieldietrich commented 13 years ago

The changes are found in the branch 'multilang'. Other languages are supported now, including single line and multi line comments. The unit test has a few more cases. I renamed the whole project (+.protectedregions) because I will soon add other projects which have similar packages

danieldietrich commented 13 years ago

It seems that the current language support is not sufficient for all languages out there. Currently the parser does not recognize nested comments, like Scala allows. Example: /* /* */ */ is not a valid Java comment, but Scala allows it. On the other hand /* /* */ is a valid Java comment but Scala doesn't like it.

There are two options:

Generic approach: Extending the current parser, using the strategy pattern for example to provide customizable behavior.
Simple approach: Adding the feature of nested comments which can be turned on or off.

ceefour commented 13 years ago

Wait...why would one want nested comment support? Isnt the only string expected inside protected region "comments" are "PROTECTED something..." ? On Sep 26, 2011 11:12 AM, "Daniel Dietrich" < reply@reply.github.com> wrote:

It seems that the current language support is not sufficient for all languages out there. Currently the parser does not recognize nested comments, like Scala allows. Example: /* /* / */ is not a valid Java comment, but Scala allows it. On the other hand / /* */ is a valid Java comment but Scala doesn't like it.

I will extend the current parser, using the strategy pattern to provide customizable behavior. Implementations of the known Strategies will be provided. Developers will be able to implement their own Strategies.

Reply to this email directly or view it on GitHub:

https://github.com/danieldietrich/net.danieldietrich.xtext.protectedregions/issues/3#issuecomment-2194745

danieldietrich commented 13 years ago

Yes an no :-) First a regular comment has to be read (completely). Then s.th. called 'ProtectedRegionOracle' is asked if a given comment is a protected region.

ceefour commented 13 years ago

+1 for the project rename.

Anyway a tip regarding Git(Hub) repos. A repo consist of one or more projects, my "best practice" is that related projects should be in one repo (e.g. in text project, the generator, tests, and ui should be in one repo). Grouping projects in a repo makes it easier to follow a project "cluster" without cloning too many repos.

However there are times when it's reasonable to split a large project to two repos or more. One reason is git branching applies to whole repo, not directories. However in practice, this isn't "strict", i.e. you can create a branch focusing on just one Eclipse project in a Git repo with multiple projects, ignoring the state of all other projects.

A good rule of thumb is that projects "released together in a bundle" should be in the same repo. Don't worry too much about this though... refactoring happens anyway. :)

ceefour commented 13 years ago

A note.. I don't usually qualify git repos, for a git repo usually the naming is just "protectedregions" or "xtext-protectedregions". Saves also some typing. :) Inside it there would be Eclipse projects which are usually qualified names anyway.

danieldietrich commented 13 years ago

thank yoy hendy, i appreciate that! be prepared for another repo renaming :-D btw. i redesigned the parser. now its a good one - the code is readable :-) i push it to the upstream later

danieldietrich commented 13 years ago

Pushed a new version to the upstream. Having now convenient interfaces and a clear code base. Unit tests updated as well. Perhaps we could add more languages in the factory.

danieldietrich / xtext-protected-regions

Support other comment styles #3