Closed samoht closed 9 years ago
Was going to report that aswell. My repos are being converted to SML on new pushes. See e.g.
https://github.com/dbuenzli/rtime https://github.com/dbuenzli/mtime
Makes good jokes though... https://github.com/ocaml/ocaml
My bad, it's a side effect of #2087. The Bayesian classifier doesn't seem to be able to distinguish the two languages. Do you know which keyword we could use to distinguish them? (keywords that exist in one of the languages and not the other)
I'm not very knowlegable in SML but this (a little bit old) page has a few hints. I'd suggest
datatype
, withtype
, orelse
, andalso
which are SML only (could appear as identifiers in OCaml though, but rather unlikely)For reference here is the list of OCaml reserved keywords.
@johnwhitington may have a definitive answer.
(could appear as identifiers in OCaml though, but rather unlikely)
Well or in comments, so I would rule out datatype
. The other ones mentioned seem sufficiently peculiar.
Some thoughts.
Reserved words in Standard ML:
abstype and andalso as case datatype do else end exception fn fun handle if in infix infixr let local nonfix of op open orelse raise rec then type val with withtype while
In OCaml:
and as assert asr begin class
constraint do done downto else end
exception external false for fun function
functor if in include inherit initializer
land lazy let lor lsl lsr
lxor match method mod module mutable
new object of open or private
rec sig struct then to true
try type val virtual when while
with
So, if we remove words which might be very common value (variable) names, then good positive indicators of OCaml would be
assert class external functor match mutable struct try inherit module virtual
And good positive indicators of Standard ML might be
abstype datatype handle infixr nonfix withtype local andalso orelse
Unfortunately, many Standard ML programs may not contain any of these.
Perhaps the common strongest discriminator of OCaml would be the two-keyword sequence "let rec" appearing in a .ml file.
One other option: use the project context. If the project name contains the string ocaml
or if the filename contains ocaml
(case insensitive), or if there are some .mli
files in the project then that's definitely an OCaml project.
The ratio of ->
vs =>
can be a good indicator. In OCaml, the former is very common (pattern matches, lambdas, arrows types) and the latter is not even keyword nor a bound identifier. In SML the former is for types only and the latter is very common (pattern matches, lambdas).
It might be that the mere presence of =>
would be a good enough classifier for sml.
It might be that the mere presence of => would be a good enough classifier for sml.
We could use the presence of =>
for SML and a regular expression on a ->
construction for OCaml (see heuristics.rb
for examples). What do you think?
@pchaigno module
, let rec
and ->
seem to be a good way to disambiguate OCaml code.
disambiguate "SML", "OCaml" do |data|
if /=> /.match(data)
Language["SML"]
elsif /module|let rec /.match(data)
Language["OCaml"]
end
end
Something like that. I've never written ruby before so can't guarantee anything.
SML uses "signature" and "structure" where OCaml uses "module type" and "module". This can never be seen in OCaml:
signature Foo = sig ... end
structure Bar [ : Foo | :> Foo ] = struct ... end
Case expression and anonymous function syntax is distinct:
(* SML *)
case <pattern> of
(* Ocaml *)
match <pattern> with
(* SML *)
fn x => <expr>
(* OCaml *)
fun x -> <expr>
SML "val foo = ..." binding syntax also cannot occur in OCaml. OCaml "val" keyword is only used in module signatures where it looks like
val foo : int -> int
Since virtually every ML program contains bindings, this probably can be a good indicator.
SML let expression syntax is "let ... in ... end" with multiple bindings between "let" and "in", which also never occurs in OCaml.
Hi @pchaigno, Curious where things stand with this. I see a PR from @samoht and if this is ok, can it be merged? It may not seem obvious right now but this issue is affecting the discovery of new repos/projects that use OCaml.
@amirmc I answered in #2227. I should also clarify that I am not from GitHub, I'm just a regular contributor.
Whoops! My bad. I saw an Octocat and just jumped to conclusions. :) Thanks for helping out with this (and labtocat is pretty cool, btw).
@johnwhitington or @kayceesrk, do you know if .ML
is an usual extension for Standard ML programs?
I don't know what is standard, but a quick google suggests both mlton and SML/NJ seem to expect .sml:
.sml
and .sig
is the standard in MLton.
I tried to do some quick searching on GitHub (not straightforward given the current state) and found the following Standard ML project - https://github.com/HOL-Theorem-Prover/HOL.
It seems substantive and active, given the number of watchers, forks and stars. Files end in .sml
@mn200, sorry to tag you in a thread out of the blue but perhaps you can help us clarify whether .ML
is usual extension for Standard ML projects.
.sml is nearly universal among SML users (and implementations), I've never seen anyone using .ml. You can see a lot of SML code here: http://github.com/standardml
Conversely, .ml and .mli are nearly universal among OCaml users.
As others have said, .sig
and .sml
are standard for SML. Some do use .ML
(see for example the sources in the Isabelle system), but this is less common (and certainly overlaps with OCaml usage if you ignore case).
Wonderful. Thank you, all.
I expect it might take a little while to propagate but it's already looking much better:
Great!
There are lots of repos that still seem to be wrong. For example, near and dear to my heart:
https://github.com/janestreet/core_kernel
is supposedly 77.8% SML. When should we expect this rerun to be complete?
@yminsky it seems that updates are performed when you push to the repo, see https://help.github.com/articles/my-repository-is-marked-as-the-wrong-language/
(though I also still have mislabellings after having done so e.g. https://github.com/dbuenzli/mtime, I don't know if you have to actually touch the files).
Would the regexps ^#
and/or ;;
be unique to OCaml?
Both are possible but unlikely in SML.
fun f (x : {fld : int}) =
#fld x;;
is weird looking but valid SML.
There are lots of repos that still seem to be wrong. For example, near and dear to my heart:
@yminsky - repository stats are only updated when there is a push event. I've manually recalculated the statistics for https://github.com/janestreet/core_kernel and things are looking much better.
@dbuenzli - any update to any file after a new version of Linguist will completely rebuild the language statistics.
I submitted #2270 which should at least catch those few OCaml files which use a shebang.
I guess ^#
would only catch a few usages of OCaml toplevel directives.
Another idea: isn't it the case that module
at the start of a line is fairly common in OCaml files (about 90% of them it seems), and not common in Standard ML?
See e.g. https://github.com/search?q=language%3Asml+module+-extension%3Asig+-extension%3Acache&type=Code
module
is not an SML keyword, and so would be unlikely in column 1.
Here is a list of repositories that are classified by GitHub/Linguist as SML but contain the string "ocaml" in their name, description, or README. Some of these repositories are actually SML (and some of those contain files incorrectly classified as OCaml) but the most popular ones are definitely OCaml.
$ ./search.native repo ocaml --language sml --sort stars
98 results returned of 98 total
frenetic-lang/frenetic (Standard ML) [67 stars] https://github.com/frenetic-lang/frenetic The Frenetic Programming Language and Runtime System
ocaml/ocaml-re (Standard ML) [46 stars] https://github.com/ocaml/ocaml-re Pure OCaml regular expressions, with support for Perl and POSIX-style strings
zoggy/stog (Standard ML) [34 stars] https://github.com/zoggy/stog XML documents and web site compiler.
Cumulus/Cumulus (Standard ML) [27 stars] https://github.com/Cumulus/Cumulus A friendly and minimalist link sharing website
rdicosmo/parmap (Standard ML) [26 stars] https://github.com/rdicosmo/parmap Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications.
nojb/ocaml-imap (Standard ML) [25 stars] https://github.com/nojb/ocaml-imap Non-blocking IMAP4rev1 client library for OCaml
johnwhitington/cpdf-source (Standard ML) [18 stars] https://github.com/johnwhitington/cpdf-source PDF Command Line Tools Source
dbuenzli/tgls (Standard ML) [17 stars] https://github.com/dbuenzli/tgls Thin bindings to OpenGL {3,4} and OpenGL ES {2,3} for OCaml
dbuenzli/tsdl (Standard ML) [16 stars] https://github.com/dbuenzli/tsdl Thin bindings to SDL for OCaml
akabe/slap (Standard ML) [16 stars] https://github.com/akabe/slap BLAS and LAPACK binding in OCaml with type-based static size checking for matrix operations
mackwic/To.ml (Standard ML) [14 stars] https://github.com/mackwic/To.ml Implementation in OCaml of the Toml minimal langage
mjambon/biniou (Standard ML) [14 stars] https://github.com/mjambon/biniou Extensible binary data format, like JSON but faster
ocaml/opam2web (Standard ML) [13 stars] https://github.com/ocaml/opam2web A tool to generate a website from an OPAM repository
c-cube/cconv (Standard ML) [12 stars] https://github.com/c-cube/cconv combinators for type conversion (serialization/deserialization) to/from several formats. See this blog post (outdated): http://cedeela.fr/universal-serialization-and-deserialization.html
axiles/ocaml-efl (Standard ML) [11 stars] https://github.com/axiles/ocaml-efl An OCaml interface to the Enlightenment Foundation Libraries (EFL) and Elementary
pyrocat101/opal (Standard ML) [11 stars] https://github.com/pyrocat101/opal Self-contained monadic parser combinators for OCaml
mirage/ocaml-crunch (Standard ML) [10 stars] https://github.com/mirage/ocaml-crunch Convert a filesystem into a static OCaml module
modlfo/firmata (Standard ML) [10 stars] https://github.com/modlfo/firmata Ocaml library to control Firmata boards like Arduino
mirage/ocaml-fat (Standard ML) [8 stars] https://github.com/mirage/ocaml-fat Read and write FAT format filesystems from OCaml
tel/ocaml-cats (Standard ML) [8 stars] https://github.com/tel/ocaml-cats Signatures of the category theoretic style; a experiment in flattery
coccinelle/herodotos (Standard ML) [8 stars] https://github.com/coccinelle/herodotos Tracking code patterns through software versions
mirage/ocaml-pcap (Standard ML) [6 stars] https://github.com/mirage/ocaml-pcap Ocaml code for generating and analysing pcap (packet capture) files
nojb/ocaml-gsasl (Standard ML) [6 stars] https://github.com/nojb/ocaml-gsasl OCaml bindings for the GNU SASL library using Ctypes
arlencox/mlbdd (Standard ML) [6 stars] https://github.com/arlencox/mlbdd A not-quite-so-simple Binary Decision Diagrams implementation for OCaml
RobertHarper/TILT-Compiler (Standard ML) [6 stars] https://github.com/RobertHarper/TILT-Compiler Standard ML compiler based on typed intermediate languages.
hcarty/ocaml-gdal (Standard ML) [5 stars] https://github.com/hcarty/ocaml-gdal OCaml bindings to the GDAL and OGR Libraries
infidel/ocaml-mdns (Standard ML) [5 stars] https://github.com/infidel/ocaml-mdns OCaml implementation of the Multicast DNS protocol
tokenrove/shred-for-satan (Standard ML) [5 stars] https://github.com/tokenrove/shred-for-satan MIDI-driven metronome
ahrefs/ocaml-qfs (Standard ML) [4 stars] https://github.com/ahrefs/ocaml-qfs
jonsterling/ocaml-modular-typechecking (Standard ML) [4 stars] https://github.com/jonsterling/ocaml-modular-typechecking Modular type checking using open types
rgrinberg/stringext (Standard ML) [4 stars] https://github.com/rgrinberg/stringext Extra string functions for OCaml
mirage/mirage-net-unix (Standard ML) [4 stars] https://github.com/mirage/mirage-net-unix Ethernet networking interface for Unix Mirage applications using tuntap
tobiasBora/phluor_tools (Standard ML) [4 stars] https://github.com/tobiasBora/phluor_tools A framework to organise a website based on ocsigen (Ocaml)
mirage/mirage-net-xen (Standard ML) [4 stars] https://github.com/mirage/mirage-net-xen Xen Netfront ethernet device driver for Mirage
mirage/mirage-console (Standard ML) [4 stars] https://github.com/mirage/mirage-console Portable console handling for Mirage applications
mirage/mirage-block-xen (Standard ML) [4 stars] https://github.com/mirage/mirage-block-xen Client and server implementations of the xen paravirtualised block driver protocol
OCamlPro/operf-macro (Standard ML) [4 stars] https://github.com/OCamlPro/operf-macro Some macro-benchmarks for operf and an OPAM repository for them
jhckragh/SMLDoc (Standard ML) [4 stars] https://github.com/jhckragh/SMLDoc SMLDoc, detached from the SML# distribution
avsm/ocaml-dockerfile (Standard ML) [3 stars] https://github.com/avsm/ocaml-dockerfile OCaml interface for creating Dockerfiles
struktured/ocaml-prob-cache (Standard ML) [3 stars] https://github.com/struktured/ocaml-prob-cache Polymorphic probability caches in OCaml, including a distributed riak backed cache.
lpw25/compiler_eq (Standard ML) [3 stars] https://github.com/lpw25/compiler_eq Tool for comparing OCaml compilers
aluuu/frmttr (Standard ML) [3 stars] https://github.com/aluuu/frmttr Type-safe sprintf analog in OCaml
scvalex/Super-Max (Standard ML) [3 stars] https://github.com/scvalex/Super-Max A catch-all for game-related projects
tokenrove/zookicker (Standard ML) [3 stars] https://github.com/tokenrove/zookicker
mietek/et-language (Standard ML) [3 stars] https://github.com/mietek/et-language ET (IPL) language interpreters and literature
yanguango/visual_sort (Standard ML) [2 stars] https://github.com/yanguango/visual_sort Sorting Visualization based on OCaml
linerlock/featherweight-java (Standard ML) [2 stars] https://github.com/linerlock/featherweight-java An experimental implementation of (extended) featherweight-java (FJ) written in OCaml.
samoht/mirmin (Standard ML) [2 stars] https://github.com/samoht/mirmin Example of a Mirage unikernels using Irmin
whitequark/ocamlnet (Standard ML) [1 stars] https://github.com/whitequark/ocamlnet An automatically updated mirror of https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code
OCamlPro/ocaml-benchs (Standard ML) [1 stars] https://github.com/OCamlPro/ocaml-benchs Sources of the set of benchmarks distributed in OCamlPro/opam-bench-repo
bkc39/ocaml-prelude (Standard ML) [1 stars] https://github.com/bkc39/ocaml-prelude Includes the functions you need, that INRIA didn't.
tel/ocaml-collage (Standard ML) [1 stars] https://github.com/tel/ocaml-collage
tel/ocaml-abt (Standard ML) [1 stars] https://github.com/tel/ocaml-abt Abstract binding trees
choeger/modelica.ml (Standard ML) [1 stars] https://github.com/choeger/modelica.ml Modelica frontend implemented in OCaml
stephlm2dev/SchmilkaHashCode (Standard ML) [1 stars] https://github.com/stephlm2dev/SchmilkaHashCode Team Schmilka for Google Hash Code 2015
smondet/locoseq (Standard ML) [1 stars] https://github.com/smondet/locoseq Automatically exported from code.google.com/p/locoseq
melsman/sml-llvm (Standard ML) [1 stars] https://github.com/melsman/sml-llvm Standard ML Bindings for LLVM
gameboy1024/minijavac (Standard ML) [1 stars] https://github.com/gameboy1024/minijavac A school project where we tries to develop a compiler for a fictional language called minijava.
massimo-nocentini/theory-of-programming-languages (Standard ML) [1 stars] https://github.com/massimo-nocentini/theory-of-programming-languages Bag for my work during the course of Theory of Programming Languages at University of Florence
simonegasperoni/funzionale (Standard ML) [0 stars] https://github.com/simonegasperoni/funzionale ocaml
thomas-huet/coop-ocaml (Standard ML) [0 stars] https://github.com/thomas-huet/coop-ocaml coop is a cooperative threads library
MFreidank/ocaml_exercising (Standard ML) [0 stars] https://github.com/MFreidank/ocaml_exercising
Lokibes/obelisk-ocaml (Standard ML) [0 stars] https://github.com/Lokibes/obelisk-ocaml Automatically exported from code.google.com/p/obelisk-ocaml
zakhar/ocaml-onnt (Standard ML) [0 stars] https://github.com/zakhar/ocaml-onnt Automatically exported from code.google.com/p/ocaml-onnt
taquangtrung/ocaml-tools (Standard ML) [0 stars] https://github.com/taquangtrung/ocaml-tools
i-am-jd/ocaml-onnt (Standard ML) [0 stars] https://github.com/i-am-jd/ocaml-onnt Automatically exported from code.google.com/p/ocaml-onnt
suisse91/ocaml_mylist (Standard ML) [0 stars] https://github.com/suisse91/ocaml_mylist
jrrk/ocaml-for-ios (Standard ML) [0 stars] https://github.com/jrrk/ocaml-for-ios Automatically exported from code.google.com/p/ocaml-for-ios
zoggy/ocamldoc-generators (Standard ML) [0 stars] https://github.com/zoggy/ocamldoc-generators A collection of custom ocamldoc generators.
domsj/orocksdb (Standard ML) [0 stars] https://github.com/domsj/orocksdb An OCaml RocksDb binding using ocaml-ctypes
HerbertJordan/otest (Standard ML) [0 stars] https://github.com/HerbertJordan/otest OCaml testing framework
fetburner/OFold (Standard ML) [0 stars] https://github.com/fetburner/OFold fold in OCaml influenced by "Programming in OCaml"
fetburner/OCat (Standard ML) [0 stars] https://github.com/fetburner/OCat cat in OCaml influenced by "Programming in OCaml"
fetburner/owc (Standard ML) [0 stars] https://github.com/fetburner/owc wc in OCaml influenced by "Programming in OCaml
SusanHuang/MinimalistGrammarWithCoordination (Standard ML) [0 stars] https://github.com/SusanHuang/MinimalistGrammarWithCoordination Minimalist Grammar with Coordination (OCAML)
art1pirat/img_pipieline (Standard ML) [0 stars] https://github.com/art1pirat/img_pipieline Ocaml image Pipeline (using camlimages)
rcefala/pascaml (Standard ML) [0 stars] https://github.com/rcefala/pascaml A pascal interpreter written in OCaml
fpottier/pprint (Standard ML) [0 stars] https://github.com/fpottier/pprint A pretty-printing combinator library for OCaml
thomas-huet/lwt-pgocaml (Standard ML) [0 stars] https://github.com/thomas-huet/lwt-pgocaml Wrapper to use Lwt with PG'OCaml
coutar-a/My_list (Standard ML) [0 stars] https://github.com/coutar-a/My_list Alternative implementation of lists in Ocaml
sfritz/a-song-of-ones-and-zeros (Standard ML) [0 stars] https://github.com/sfritz/a-song-of-ones-and-zeros Conway's Game of Life in OCaml
antoyo/tq (Standard ML) [0 stars] https://github.com/antoyo/tq Text-User Interface Library with Widgets Written in OCaml
juster/ffp (Standard ML) [0 stars] https://github.com/juster/ffp The FFP language of John Backus in Ocaml
mohamedaf/Projet1-CompilationAvancee (Standard ML) [0 stars] https://github.com/mohamedaf/Projet1-CompilationAvancee Compilation d'un programme en OCAML en un programme C équivalent
IzzyRahaman/99MLProblems (Standard ML) [0 stars] https://github.com/IzzyRahaman/99MLProblems Attempts at solving the 99 Ocaml Problems originally derived from the 99 Prolog Problems in ML
daherb/Kreis-Kugel (Standard ML) [0 stars] https://github.com/daherb/Kreis-Kugel Trying to find a way to place points with equal distance onto a circle/sphere
cicku/stcntroll (Standard ML) [0 stars] https://github.com/cicku/stcntroll This is a "handy" but enigmatic tool to pick up the lucky dog ^_^/
pgalland/ProgProblems (Standard ML) [0 stars] https://github.com/pgalland/ProgProblems
remyzorg/ppx_comprehension (Standard ML) [0 stars] https://github.com/remyzorg/ppx_comprehension Syntax extension point for list comprhension
iraikov/pprint (Standard ML) [0 stars] https://github.com/iraikov/pprint Pretty printing library for Standard ML
iraikov/mpi-mlton (Standard ML) [0 stars] https://github.com/iraikov/mpi-mlton MPI bindings for Standard ML / MLton
BernardBeefheart/ml-games (Standard ML) [0 stars] https://github.com/BernardBeefheart/ml-games jouer avec ML (Standard ML)
spacemanaki/lexluthor (Standard ML) [0 stars] https://github.com/spacemanaki/lexluthor a library for building lexical analyzers
Alexis211/SystemeReseaux-Projet (Standard ML) [0 stars] https://github.com/Alexis211/SystemeReseaux-Projet
gfxmonk/passe (Standard ML) [0 stars] https://github.com/gfxmonk/passe
khuumi/SNL (Standard ML) [0 stars] https://github.com/khuumi/SNL
bdkoepke/pfds (Standard ML) [0 stars] https://github.com/bdkoepke/pfds Purely Functional Data Structures
velour/caml-spt (Standard ML) [0 stars] https://github.com/velour/caml-spt Automatically exported from code.google.com/p/caml-spt
It would appear that the search index has a cache that is/was out-of-date. Only some of the repositories I just reported are now misclassified. Sorry for the confusion.
Yup - I'm going through these manually now re-indexing them.
On 8 April 2015 at 10:36, David Sheets notifications@github.com wrote:
It would appear that the search index has a cache that is/was out-of-date. Only some of the repositories I just reported are now misclassified. Sorry for the confusion.
— Reply to this email directly or view it on GitHub https://github.com/github/linguist/issues/2208#issuecomment-90953023.
In case you are interested, some more examples of (recently pushed to) repositories with files misidentified as SML:
Everything was working fine until few days ago: all my new projects are now begin reported to be written in Standard ML instead of OCaml. See https://github.com/samoht/ocaml-huffman-code.