Closed yenif closed 8 months ago
Nice One, thanks a lot!
I was testing it with the following ruby code snippet
# A class for detecting the programming language based on file extension.
class LanguageDetection
# Enumeration representing various programming languages.
module Language
PYTHON = :python
JAVASCRIPT = :javascript
TYPESCRIPT = :typescript
JAVA = :java
KOTLIN = :kotlin
LUA = :lua
UNKNOWN = :unknown
end
# Gets the corresponding programming language based on the given file extension.
def self.get_programming_language(file_extension)
language_mapping = {
'.py' => Language::PYTHON,
'.js' => Language::JAVASCRIPT,
'.ts' => Language::TYPESCRIPT,
'.java' => Language::JAVA,
'.kt' => Language::KOTLIN,
'.lua' => Language::LUA
}
language_mapping[file_extension] || Language::UNKNOWN
end
def self.get_file_extension(file_name)
File.extname(file_name)
end
end
but somehow the treesitter output is this 🤔
(program (class_declaration (ERROR (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identif
ier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier)) name: (identifier) (ERROR) body: (c
lass_body (ERROR (character_literal)) (field_declaration type: (type_identifier) (ERROR) declarator: (variable_declarator name: (identifier)) (ERROR (character_literal) name: (identifier)) declarat
or: (variable_declarator name: (identifier)) (ERROR (character_literal) name: (identifier)) declarator: (variable_declarator name: (identifier)) (ERROR (character_literal) name: (identifier)) decla
rator: (variable_declarator name: (identifier)) (ERROR (character_literal) name: (identifier)) declarator: (variable_declarator name: (identifier)) (ERROR (character_literal) name: (identifier)) de
clarator: (variable_declarator name: (identifier)) (MISSING ";")))) (expression_statement (binary_expression left: (array_access array: (identifier) index: (identifier)) right: (method_reference (i
dentifier) (identifier))) (MISSING ";")) (local_variable_declaration type: (type_identifier) declarator: (variable_declarator name: (identifier)) (MISSING ";")) (expression_statement (method_invoca
tion object: (method_invocation object: (identifier) name: (identifier) arguments: (argument_list (identifier))) (ERROR (identifier)) name: (identifier) arguments: (argument_list (identifier))) (MI
SSING ";")) (local_variable_declaration type: (type_identifier) declarator: (variable_declarator name: (identifier)) (MISSING ";")))
Not related to our changes, this is related to the ruby treesitter parser. Have you been able to run it with some ruby code?
You can log the treesitter syntax with print(self.tree.root_node.sexp())
btw.
Yea I was getting similar testing against this file
#! env ruby
require 'ostruct'
# method line comment
def global_method
puts "global_method"
end
=begin
class block comment
=end
class TopClass < OpenStruct
# multiline
# hash comment
# on InnerModule
module InnerModule
# comment on module method
def module_method
puts "module_method"
end
end
# comment on inner class
class InnerClass
def inner_class_instance_method
puts "inner_class_instance_method"
end
# comment on inner class method
def self.inner_class_class_method
puts "inner_class_class_method"
end
def self.inner_class_class_method_with_args(arg1, arg2)
puts "inner_class_class_method_with_args: #{arg1}, #{arg2}"
end
class < self
# comment on inner eigen class method
def inner_eigen_class_class_method_with_args2(arg1, arg2)
puts "inner_eigen_class_class_method_with_args2: #{arg1}, #{arg2}"
end
end
end
end
It runs, but definitely not optimal. I'll hopefully have some more time to poke a it this week
Hey @yenif any luck? let me know if you need some help 🙌
I'll finally have some time today :-) but honestly the issue list over on ruby treesitter is not giving me confidence that this will be doable
Looks like the error nodes are just syntax treesitter doesn't support. The rest of the syntax seems like it is still getting parsed correctly and I think gets all of the methods.
I think multiline comments are currently not being handled, need to iterate node.prev_named_sibling
Sufficiently verbose method comments would probably work well as you mention in the docs. Seems like an opportunity to expand any existing comments with further hints like full method path, maybe list of references to dependencies/dependents of the method.
I'm at the limit for tonight, possibly tomorrow. Definitely welcome to make any edits or drive it home if its on your list :-)
@yenif Hey sorry for the late response, lets try to ship this! I think we can iterate over multine comments similar to rust here https://github.com/fynnfluegge/codeqai/blob/main/codeqai/treesitter/treesitter_rs.py
I will test this soon 🙂
No worries, I've been trying to get back but two year old and day job aren't leaving much extra :-)
Ran into further issue with treesiter that it treats a comment in the first line of a method body as a sibling to the method body. This would match with the Python (and others) convention of putting method comments on the first line of the method or module, but Ruby convention doesn't do this and a comment is almost always associated to the following code. So a comment on a method at the beginning of a module
module InnerModule
# multiline
# comment on module method
def module_method
# first line comment
puts "module_method"
end
end
results in something like
module
module
constant InnerModule
comment # multiline
comment # comment on module method
body_statment
method
def
identifier module_method
comment # first line comment
body_statement
...
end
end
Of course can traverse up and check types and then do siblings, but that creates a bunch more edge cases
@yenif Is this ready to merge, what would you say? :)
Seems like it matches existing functionality, so yeah good to merge if you are happy with it!
Alright, thanks for your effort! ❤️
Hey! I took a swing at adding Ruby support
First pass just grabs method names, I'm thinking it might work better with fully qualified method names