coala-antlr Enhancement

Dhiraj240 commented 5 years ago

Particularly i will start by adding the grammars which will be asm8080,asm8086 which is intel based processors.For that language definitions for the same will be added in coala.Now the bears for them will be added as of now there are no bears which works for the microprocessors or microcontrollers.Similarly a grammar will be updated on upstream for PIC microcontrollers where a language definition will also be included in coala for the same.Henceforth the bears for the microcontroller and microprocessors will be supported by this project.Basically bears will be used to style the opcode like capitalization of special functions registers etc. which are used for ADC, sensors interfacing. Apart from this i am planning to make a tree rewriting based on the above grammars generated trees it could be done using the generated template, if it is possible "without having that functionality on the upstream antlr".

Dhiraj240 commented 5 years ago

@virresh @corona10 sorry but as per this i tagged you as you could be the potential mentors for this.

Dhiraj240 commented 5 years ago

@jayvdb is this suitable for GSOC or should i leave it for non-GSOC work? and better to focus on other areas.

virresh commented 5 years ago

@Dhiraj240 , I think there are already existing grammars in ANTLR for 8080 and 8086, so it'd be just a kind of an extension to coantlib if you go down that line

Also just to mention, I've not come across open source assembly projects except for a few like GRUB (and that too uses intel assembly format, and that too only at the place where it needs to interface during boot time). I think it would be great if you could find some application for this as well, since mostly people just write their source in c, and let the compiler generate / cross-compile applications for whichever platform they need (micro-controllers / pcs etc etc)

Moving on to the tree-rewriting part, that'd be an interesting thing to do, but why would you want to do that "without having that in the upstream ANTLR" ? Wouldn't it be better to implement it at a place where it will be more maintainable ?

Dhiraj240 commented 5 years ago

I was actually asking that is it needed to implement "tree rewriting at upstream" first ? i guess this work is needed on coala-antlr only as it will enhance this project. isn't the upstream ANTLR and coala-antlr independent now ? am i right ? I had serious health issue during the start of new year but revived though.I will look for GRUB and update more info over it as what all i can extend to it. I remember that a basic unittest is still remaining, will be back to work this week. :smile:

virresh commented 5 years ago

Sure, you can do that only in coala-antlr, but do keep in mind coala-antlr still uses the core developed by antlr's python port. Unless you plan on changing all of that and duplicating everything already given by antlr4-python3-runtime, I don't think there would be a lot of trouble.

I see that you did open an issue for tree-rewriting, possibly you should see the already existing discussions at https://github.com/antlr/antlr4/issues/369 and follow it up with what have you thought about it. (Do go through the thread, many people have some good strategies for this already, and remember that whatever comes to python port must first satsify the java version's requirement of ANTLR, and not disrupt the existing architecture, not to mention this has little to do with coala-antlr if you go down this line)

Also lastly, I didn't say you should work on extending GRUB, that was just an example of the very rare projects that still use assembly in some part of their codebase. The main point there was that usually people don't code in assembly, they simply use a cross-compiler to get what they want from a much more readable code written in a high level language such as c, and hardly does anyone care about the readability of the compiled code...

Regards Viresh Gupta

On Tue, Feb 5, 2019 at 12:48 AM Dhiraj Sharma notifications@github.com wrote:

I was actually asking that is it needed to implement "tree rewriting at upstream" first ? i guess this work is needed on coala-antlr only as it will enhance this project. isn't the upstream ANTLR and coala-antlr independent now ? am i right ? I had serious health issue during the start of new year but revived though.I will look for GRUB and update more info over it as what all i can extend to it. I remember that a basic unittest is still remaining, will be back to work this week. 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coala/projects/issues/698#issuecomment-460375395, or mute the thread https://github.com/notifications/unsubscribe-auth/AKeDPFRB0Axalb_g_ByWq3owFm1ymt2Bks5vKIeDgaJpZM4ZpUDn .

Dhiraj240 commented 5 years ago

Ahh, that "extension" word was for enhancing coala-antlr not GRUB. Anyways, but you tell me is it really needed to add tree rewriting process in the upstream.Have you tried directly on coala-antlr before ? If not then before gsoc we can start working on this. Also

I think it would be great if you could find some application for this as well, since mostly people just write their source in c, and let the compiler generate / cross-compile applications for whichever platform they need (micro-controllers / pcs etc etc)

What sort of applications ? Can you provide me the example i am unclear about it ? One thing i add to it as when people are writing in embedded C code for different microcontrollers instead of assembly, they still have to add different register names, USART communications etc. things which varies in various microcontrollers. But the grammars for asm8080, asm8086 are for assembly language that i agree and its written in assembly language only . Also PIC, ARM series controllers uses embedded C only and its grammar is not available and one has to add it which is bigger picture. You are right in as some people uses assembly code for 8051 and embedded C too. Some curiosity is arising please keep on this discussion, need example of "application".

virresh commented 5 years ago

From my understanding of embedded C, there is only a very tiny portion of the whole code written in assembly, so in reality, the bigger picture about what you're talking is "nested languages". It's already an existing project for coala, feel free to look it up.

Again, "application" means what kind of developers would be interested in using your bear ? As an example, will the grub developers use your bear ? Certainly not, as they would rather use a C linter and write that small part of assembly in a maintainable way themselves. It wouldn't make sense for them to lint their majorly C file with a linter for 8080 or 8086.

Also implementing tree rewriting for only one kind of language would mean you're only supporting source code for one particular language (and thus no work in either coantlib or coala and possibly none in antlr either), in which case it's better to create a separate project for linting such source codes and simply add a bear into coala for your project. (See any existing linter bear).

Dhiraj240 commented 5 years ago

Alright, i think i should focus on contribution and learning. Let the idea page come out and drop this idea for a while.

Dhiraj240 commented 5 years ago

@virresh alongside to extend this project using asm8085, asm8086 if i add verilog and vhdl too. Also bears for verilog and vhdl can be developed where there is no issue that people can use cross platform compilers. I think "tree-rewriting" could all be done once the upstream will process that functionality. Now things can be subdivided:

To extend coala-antlr asm8085, asm8086, vhdl, verilog, matlab grammars which are already available will be added.
Subsequent bears for the developers of vhdl, verilog can be developed.

Lets keep it basic rather main focus is that is there any way to optimize the generate_all_in_one.py file.

Can we use NLTK for this purpose as in our generate_all_in_one.py file user will provide the filename.py inside it which will be imported as data into the NLTK code where cleaning is done where words like del etc. where present (sorry i wasn't able to recall all other code lines which are not needed) because in every combined coantparser we know the main code line which has to be omitted so as to configure the NLTK for that purpose and enhance instead of manually changing the script it will be automatically done and user just need to provide the combined script name like my G4.py. I hope this sounds great !! as i spent 2-3 days as what should i do and i will also do the small implementation of this. If this succeed then multiple grammars can be added with ease. Hence the week 3 of my GSOC will be focused on developing the bears of above said grammars.

Dhiraj240 commented 5 years ago

Please do reply.

virresh commented 5 years ago

yes we need this system, and that is what https://gitlab.com/coala/bears/coala-antlr/issues/11 and other related issues are about

I would highly advise you to go through the existing issues so that you don't need to spend time on coming up with ideas for things that have been already thought of...

I'm not sure what you wish to do with NLTK here, we aren't trying to make sense of words here, we're just stiching together some pieces.

Also do read up about which code lines weren't readed, you have commits that can help you determine the same, in case you wish to work on this

Dhiraj240 commented 5 years ago

Cool, found a way i will work on proposal based on existing issues and will try NLTK on my code. Thanks, hope to see fruitful results of one year bonding. :smile:

coala / projects

coala-antlr Enhancement #698