Open spratt opened 12 years ago
This doesn't seem tractable, as indicated by the ubiquity of "C-Like". Also if you have a string with another language in it, how would you possibly resolve that? Is this some HTML with javascript, or some javascript with HTML?
Our first idea is a Bayesian classifier. Basically build a score for each language, maybe based on the number of keywords matched, and calculate the probability of each language. When the probability passes a certain threshold, make that guess.
As the user enters code in the code submission window, we should make a reasonable guess at which language they are using, but still let the user pick their language, and stop trying to auto-detect once they've chosen.