Comments in assembler files for some architectures are indexed as identifiers

bootlin / elixir

The Elixir Cross Referencer

GNU Affero General Public License v3.0

973 stars 144 forks source link

Comments in assembler files for some architectures are indexed as identifiers #291

Open fstachura opened 3 months ago

fstachura commented 3 months ago

See for example: https://elixir.bootlin.com/linux/latest/source/arch/arc/kernel/head.S#L25 https://elixir.bootlin.com/linux/latest/source/arch/sh/kernel/entry-common.S#L8 https://elixir.bootlin.com/linux/latest/source/arch/arm/kernel/entry-common.S#L43 (end of the line, after @)

This is likely due to the fact, that different architectures have different comment syntax in GNU Assembler. https://en.wikipedia.org/wiki/GNU_Assembler

tleb commented 3 months ago

Do you have any idea to address this? The indexer must be able, just from the file content and its filepath, to determine how parsing should be done. That sounds hard, or project specific.

fstachura commented 3 months ago

I don't see any good way to detect architecture from the assembler file alone, I think this would have to be fixed in a different way for every project. Another problem is that most assembler files in Linux codebase are in arch/, however, some are not. See https://elixir.bootlin.com/linux/latest/source/drivers/memory/ti-emif-sram-pm.S

tleb commented 3 months ago

OK, last solution would be to make the general parsing we do be more generic to other assembly language files. For example (^|\s)!\s being the start of a line comment seems pretty safe from a quick grep. Same for ;.

We wouldn't want to have each project definition having to define parsing of assembly files. That's too complex for little benefits.

This sounds low priority to me though.