Closed andymeneely closed 11 years ago
Couple of issues:
Hm - that's a bit unexpected. I knew they had lots of modules, but I didn't realize that they were all literally one source file. Are you sure? When I browse the directory tree, for example, I see subdirectories under modules like "http" and "ldap" that have multiple source files. Maybe we can consider that to be one component?
Is there any way to tell if a module is in the core or not? Is it in the filepath? Do they record it on their website? Git doesn't keep design information, just file information.
Yes, component churn is also aggregated over the last 30 days (configurable, as before).
Yes, am sure! for example a module like 'mod_alias' has one source file 'mod_alias.c' (http://httpd.apache.org/docs/2.2/mod/mod_alias.html). Certainly what what they call 'modules' is not entirely consistent with the file structure. If you search 'gitlogfiles' for files containing 'mod_alias' for example you'll see the following files: 'modules/mappers/mod_actions.c' 'modules/mappers/mod_alias.c' 'modules/mappers/mod_dir.c' 'modules/mappers/mod_imagemap.c' 'modules/mappers/mod_imap.c' 'modules/mappers/mod_negotiation.c' 'modules/mappers/mod_rewrite.c' 'modules/mappers/mod_rewrite.h' 'modules/mappers/mod_so.c' 'modules/mappers/mod_so.h' 'modules/mappers/mod_speling.c' 'modules/mappers/mod_userdir.c' 'modules/mappers/mod_vhost_alias.c' 'modules/mappers/mod_watchdog.c' 'modules/mappers/mod_watchdog.h'
And by our assessment, these are 'mappers' module, but 'mod_alias' is considered as a module of its own.
So maybe we can use our classification and call them Components in other to differentiate, or we can do more research to reach a level ground of understanding.
About the component churn, in that case then files in the same folder will have the same number of churn. if so, it is better we have the metric on a separate table for components and join the table with gitlogfiles.
On Fri, Jan 18, 2013 at 10:36 PM, Andy Meneely notifications@github.comwrote:
Hm - that's a bit unexpected. I knew they had lots of modules, but I didn't realize that they were all literally one source file. Are you sure? When I browse the directory tree, for example, I see subdirectories under modules like "http" and "ldap" that have multiple source files. Maybe we can consider that to be one component?
Is there any way to tell if a module is in the core or not? Is it in the filepath? Do they record it on their website? Git doesn't keep design information, just file information.
Yes, component churn is also aggregated over the last 30 days (configurable, as before).
— Reply to this email directly or view it on GitHubhttps://github.com/apmeneel/httpd-history/issues/27#issuecomment-12449884.
Ok, then let's step back and examine what we're trying to do.
We're trying to group similar files together according to the system architecture so that we can identify recent, related code churn for a given file. For example, maybe there's been a lot of recent commits to HTTP packet parsing modules lately, but not this file, and yet this file is still at risk of introducing a vulnerability. If we make our grouping by just one file, then that makes no sense because then RecentComponentChurn would be very close to just RecentChurn.
So is there some kind of grouping (and maybe there isn't one) where we can do this? Maybe just a binary grouping of Core or Module? Or maybe there's a logical grouping of modules that we can discern? Is, say, mappers a logical grouping, or should that just be ignored?
In that case I think we should go with the package grouping as it is in the httpd directory.
On Sat, Jan 19, 2013 at 3:50 PM, Andy Meneely notifications@github.comwrote:
Ok, then let's step back and examine what we're trying to do.
We're trying to group similar files together according to the system architecture so that we can identify recent, related code churn for a given file. For example, maybe there's been a lot of recent commits to HTTP packet parsing modules lately, but not this file, and yet this file is still at risk of introducing a vulnerability. If we make our grouping by just one file, then that makes no sense because then RecentComponentChurn would be very close to just RecentChurn.
So is there some kind of grouping (and maybe there isn't one) where we can do this? Maybe just a binary grouping of Core or Module? Or maybe there's a logical grouping of modules that we can discern? Is, say, mappers a logical grouping, or should that just be ignored?
— Reply to this email directly or view it on GitHubhttps://github.com/apmeneel/httpd-history/issues/27#issuecomment-12460907.
This is partially completed (95%). The dbverify reports 618 files without component. The query needs to be fine tuned to update up to the highest file path of the component. Eg. Filepath Component server/mpm/mpmt_pthread/scoreboard.c, server server/mpm/mpmt_beos/scoreboard.c, server server/mpm/dexter/scoreboard.h, server server/mpm/dexter/scoreboard.c, server
done and awaiting testing.
Update the GitLogFiles table to have the aggregated churn metrics for a given file for that file's component. A "component" is roughly the directory that a file is in, which varies depending on the meaning.
You'll need to infer what module a given file is in. This might take some research into Apache's architecture a bit, but from a cursory glance it looks like these are the main modules:
Not sure how to handle srclib
Not sure how to handle experimental. Is that one module? Things migrate out of there into their own modules eventually anyway.
Not sure how to handle includes for this one, though - so figure that out.
Related to Issue #5 and #21.