blackducksoftware / ohcount

The Ohloh source code line counter
https://github.com/blackducksoftware/ohcount
GNU General Public License v2.0
261 stars 69 forks source link

Add support for the Rust language. #26

Closed sebcrozet closed 10 years ago

sebcrozet commented 11 years ago

Rust is a programming language developed by Mozilla: www.rust-lang.org.

amujumdar commented 11 years ago

Thanks for this patch. We tried to compile and test it and found a couple of issues -

rust_number_entity =
'float' | 'f32' | 'f64' | 'uint' | 'int' | 'u8' | 'u16' | 'u32' | 'u64' | 'i8' | 'i16' | 'i32' |
'i64';

Does it compile for you without this change?

chris-morgan commented 11 years ago

@amujumdar What .rs things are there that aren't Rust? I'm not aware of anything else in widespread usage using it. (I mean, unless it's a significant problem, changing the detection to use file contents would probably be a bad idea.)

Then we get on to syntax changes:

Some of the other things in it may be out of date by now; hardcoding keywords and such doesn't seem a particularly productive thing to me; can we not just do without it? Does ohcount actually use anything beyond code, comment and blank?

I'm willing to update this if it needs updating.

metacritical commented 11 years ago

The last time we checked, We had 410 repositories in ohloh with .rs files. the file count will be in thousands, this would easily lead to wrong data on ohloh and hence the popularity of the language. Thus it is very important to detect file by syntax and not by extension.

chris-morgan commented 11 years ago

@pankajdoharey But how many of those are actually Rust? I would expect it to at the least be a considerable fraction of them. Anyway, I really need a list of things that I can look at before filetype detection can be meaningfully assessed. At present I have nothing to work on. Could you get me that data?

metacritical commented 11 years ago

I am not sure how many of them are rust or not but surely i can give you the list of the projects, making use of .rs files have a look at the follwoing gist :+1:

https://gist.github.com/pankajdoharey/5870f41afc40cae80511

chris-morgan commented 10 years ago

@pankajdoharey Sorry, I got distracted with other things before I finished this.

I went through all the GitHub ones and a few of the other ones in detail and classified some that were easy; these are the results I've got:

Minor scripting languages

Genuine Windows resource files

These should be using .res, which is the standard extension for such things.

Other

Uncertain

Compiled Java thing of some form that should never have been in version control

I'm not sure how these are created, but they get put as siblings to .class files and contain one class name per line. Being the compiled output, they should certainly never have been in version control. (Thus I am quite willing to ignore them.)

Arbitrary useless extension

People often give something an extension which has no inherent meaning; these appear to be such cases.

Now we get to the more important ones:

RenderScript

Rust

Unable to reproduce

(Probably not using .rs files any more.)

Unclassified

As indicated, I haven't looked at everything.

Summary

I believe that RenderScript is the only one which needs disambiguation. There are two convenient things that it can be disambiguated on:

I suggest disambiguating on the string #pragma version, treating files with it as unknown (until someone implements RenderScript in Ohcount, which I'm not going to do as I don't care about it) and files without it as Rust.

I shall now implement this in #30.