Closed ceefour closed 13 years ago
Looking at your proposed languages, the common nature of comments is:
comments have a start and an end character sequence
(This behavior implies that subsequent occurrences of comment start characters are somehow 'ignored' within comments.)
Providing a ProtectedRegionRecognizer in this way should be sufficient.
A ProtectedRegionConsumer just consumes characters (with a look ahead functionality somehow) and depends on a ProtectedRegionRecognizer which provides the needed information about the ongoing character stream. In the ProtectedRegionConsumer plays the music it has the functionality of the current DefaultDocumentWriter + the functionality of the specific Antlr parser.
interface IProtectedRegionConsumer { setProtectedRegionRecognizer(IProtectedRegionRecognizer r); IDocument parse(InputStream) throws MalformedInputException; }
interface IProtectedRegionRecognizer { java.util.Pattern getCommentStart(); // Pattern allows to differentiate by different comment styles in one language java.util.Pattern getCommentEnd(); boolean isProtectedRegionStart(String comment); boolean isProtectedRegionEnd(String comment); }
Example:
class ClojureProtectedRegionRecognizer implements IProtectedRegionRecognizer {
@Override public java.util.Pattern getCommentStart() { return Pattern.compile(";"); } @Override public java.util.Pattern getCommentEnd() { return Pattern.compile("[\n\r]"); }
private static final String ID = "([a-zA-Z$][a-zA-Z\d$].)[a-zA-Z$][a-zA-Z\d$]_"; private static final Pattern PR_START = Pattern.compile("\s_PROTECTED\s_REGION\sID\s(\s" + ID + "\s)\s_START\s*"); @Override public boolean isProtectedRegionStart(String s);
private static final Pattern PR_END = Pattern.compile("\s_PROTECTED\s_REGION\sEND\s"); @Override public boolean isProtectedRegionEnd(String s) { PR_END.matcher().matches(s); } }
Starting implementation later today!
Regards, Daniel
Update: We don't have to care about Malformed input. We don't have to know the languages in particular. That's the job of the ide / editor used when writing the documents...
Update: Regarding performance there are perhaps better approaches than using java.util.Pattern to recognize comments. I'll check if it is possible to modify the generated antlr parser and make it customizable :-)
Re: performance, on second thought I think it should be lowest priority. Most important is it should be easy and straightforward to implement. I think regex is fast enough... code generation is not something that's measured by tx/sec I guess :)
you're right. having some good ideas. think i can provide a first version later this day
The changes are found in the branch 'multilang'. Other languages are supported now, including single line and multi line comments. The unit test has a few more cases. I renamed the whole project (+.protectedregions) because I will soon add other projects which have similar packages
It seems that the current language support is not sufficient for all languages out there.
Currently the parser does not recognize nested comments, like Scala allows.
Example: /* /* */ */
is not a valid Java comment, but Scala allows it.
On the other hand /* /* */
is a valid Java comment but Scala doesn't like it.
There are two options:
Wait...why would one want nested comment support? Isnt the only string expected inside protected region "comments" are "PROTECTED something..." ? On Sep 26, 2011 11:12 AM, "Daniel Dietrich" < reply@reply.github.com> wrote:
It seems that the current language support is not sufficient for all languages out there. Currently the parser does not recognize nested comments, like Scala allows. Example: /* /* / */ is not a valid Java comment, but Scala allows it. On the other hand / /* */ is a valid Java comment but Scala doesn't like it.
I will extend the current parser, using the strategy pattern to provide customizable behavior. Implementations of the known Strategies will be provided. Developers will be able to implement their own Strategies.
Reply to this email directly or view it on GitHub:
Yes an no :-) First a regular comment has to be read (completely). Then s.th. called 'ProtectedRegionOracle' is asked if a given comment is a protected region.
+1 for the project rename.
Anyway a tip regarding Git(Hub) repos. A repo consist of one or more projects, my "best practice" is that related projects should be in one repo (e.g. in text project, the generator, tests, and ui should be in one repo). Grouping projects in a repo makes it easier to follow a project "cluster" without cloning too many repos.
However there are times when it's reasonable to split a large project to two repos or more. One reason is git branching applies to whole repo, not directories. However in practice, this isn't "strict", i.e. you can create a branch focusing on just one Eclipse project in a Git repo with multiple projects, ignoring the state of all other projects.
A good rule of thumb is that projects "released together in a bundle" should be in the same repo. Don't worry too much about this though... refactoring happens anyway. :)
A note.. I don't usually qualify git repos, for a git repo usually the naming is just "protectedregions" or "xtext-protectedregions". Saves also some typing. :) Inside it there would be Eclipse projects which are usually qualified names anyway.
thank yoy hendy, i appreciate that! be prepared for another repo renaming :-D btw. i redesigned the parser. now its a good one - the code is readable :-) i push it to the upstream later
Pushed a new version to the upstream. Having now convenient interfaces and a clear code base. Unit tests updated as well. Perhaps we could add more languages in the factory.
Hi Daniel,
Iam not familiar with Antlr. Itd be awesome if you could add support for other comment styles:
Although it may be generified (support for exotic languages or custom uses), perhaps separate lexer for each of the most common comment styles is best considering performance.
Thank you.
Regards, Hendy