github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.15k stars 4.21k forks source link

Ren'Py being detected as Python #2849

Closed jsmnbom closed 8 years ago

jsmnbom commented 8 years ago

Ren'Py is being recognized as python in some cases. If you go to github.com/bomjacob/repetition and click the languages bar you can see that it says it's 100% Python: 100 python However, if you click on the "Python 100%" it takes you to the search page, where nothing is shown. no code in search Since the project doesn't have any python files, only Ren'Py ones. In the sidebar it can be seen that it does see the files as Ren'Py ones. So why doesn't that show in the language bar?

pchaigno commented 8 years ago

This is happening because Ren'Py is in the Python group. Consequently, Ren'Py files are counted as Python in the statistics. I'd agree that the link to the search page is not particularly well integrated. However, it's not something that can be fixed in Linguist; you should report it to support@github.com ;)

jsmnbom commented 8 years ago

Is there a particular reason for Ren'Py being in the python group (I know it's derived from python, and that python can be written in it, but it isn't python). Cause it means that it's impossible to search for Ren'Py games globally on github: no renpy repos Which (as far as I understand) would be fixed by not declaring Ren'Py in the python group.

williamd1k0 commented 8 years ago

So I added the Ren'Py language in Python group because there are some .rpy scripts that are not exactly the Ren'Py language. I was really in doubt about that, actually.

jsmnbom commented 8 years ago

I believe I might have found the reason for that. It would seem that the python lib "Twisted" uses (or at least has used) .rpy files. https://twistedmatrix.com/documents/11.1.0/web/howto/web-in-60/rpy-scripts.html All the other .rpy files I've been able to find on github reference "twisted" in one way or another.

Would it be possible to write code that looks for twisted specific stuff in .rpy files and mark them as Python? Cause that could actually be seen as a bug too: .rpy twisted python file being falsely recognized as Ren'Py. I would think that it's got to be possible since there are many other languages that have extensions that overlap. And if you could then properly recognize what .rpy that are not twisted python files, mark those as Ren'Py (and not in the python group at all), so that it would then be shown on the github languages bar?

Here's a few examples of twisted python .rpy files I've found: 1, 2, 3. All found by searching the common python shebang line "#!/usr/bin/python" together with "extension:rpy".

williamd1k0 commented 8 years ago

So, I believe that this can be solved easily in reuristics.rb file, since all twisted .rpy files has python header. I just can not do it for myself because I do not have enough knowledge about Ruby :disappointed:

Anyway, need to remove the Ren'Py from Python group, adding .rpy extension in Python scope and add the regex for the headers:

#!/usr/bin/env python
#!/usr/bin/python
# -*- mode: python -*-
jsmnbom commented 8 years ago

The problem is that that isn't always the case: 1, 2. Found by searching "twisted extension:rpy". The only way I can see to solve this, would be to look for import twisted or something in the file.. This doesn't seem like a very stable solution though.

williamd1k0 commented 8 years ago

Well, maybe give it to use some other more specific way to check that. As an example, if there are import without a python block it is no longer Ren'Py.

# python
import foo

# renpy
python:
    import foo
jsmnbom commented 8 years ago

If I understand ruby regex properly something like this should work ^import[\s\S]+$. That would test of there's an import on a line with nothing in front (i.e. no indentation).

From looking at heuristics.rb I would believe that something like this should be sufficient:

disambiguate ".rpy" do |data|
  if /^import[\s\S]+$/.match(data)
    Language["Python"]
  else
    Language["Renpy"] #Is that the name? Without the ' and with a lowercase py?
  end
end

Any chance someone who actually understands ruby could take a took at it?

williamd1k0 commented 8 years ago

I've been doing some tests here and I think that should be enough:

/(^#!\/usr\/bin\/(env\s)?python)|(^#\s-\*-\smode:\spython\s-\*-)|(^(import|from)[\s\S])/m

The Ren'Py language is registered as "Ren'Py", so I think it would be like this:

disambiguate ".rpy" do |data|
  if /(^#!\/usr\/bin\/(env\s)?python)|(^#\s-\*-\smode:\spython\s-\*-)|(^(import|from)[\s\S])/m.match(data)
    Language["Python"]
  else
    Language["Ren'Py"]
  end
end

I just do not open a pull request for me right now because I'm not sure if that's all right, it would be nice if someone could check it, please.

arfon commented 8 years ago

@williamd1k0 I don't think we need the regex to look like this as Linguist already tries to detect a shebang and modeline before it hits the heuristic class:

https://github.com/github/linguist/blob/master/lib/linguist/language.rb#L91-L92

williamd1k0 commented 8 years ago

Well, this match with shebang will be enough if the file has the #!, but if not I think the second way would be by regex.

Anyway, I do not quite understand how it works by checking shebang ...

I do not know Ruby as well, but I wanted to be able to test it here, but I'm having problems for a while to install the gems ..

williamd1k0 commented 8 years ago

@arfon , I'm trying to test right now on Linux (Xubuntu 14 LTS) and bundle exec rake test is aborting. The traceback:

Finished in 33.044119s, 28.0837 runs/s, 346.0525 assertions/s.

  1) Failure:
TestGrammars#test_local_scopes_are_in_sync [/home/williamd1k0/Downloads/linguist/test/test_grammars.rb:71]:
script/convert-grammars failed

928 runs, 11435 assertions, 1 failures, 0 errors, 0 skips
rake aborted!
Command failed with status (1): [ruby -I"lib" -I"/home/williamd1k0/Downloads/linguist/vendor/gems/ruby/2.0.0/gems/rake-10.5.0/lib" "/home/williamd1k0/Downloads/linguist/vendor/gems/ruby/2.0.0/gems/rake-10.5.0/lib/rake/rake_test_loader.rb" "test/test*.rb" ]

Then I tried script/convert-grammars. Also aborting. Traceback:

sh: 1: curl: not found
sh: 1: svn: not found
sh: 1: svn: not found
sh: 1: svn: not found
sh: 1: svn: not found
sh: 1: svn: not found
sh: 1: svn: not found
script/convert-grammars:116:in `fetch': Failed to export SVN repository: http://svn.textmate.org/trunk/Review/Bundles/BlitzMax.tmbundle: pid 73768 exit 127 (RuntimeError)
    from script/convert-grammars:186:in `load_grammars'
    from script/convert-grammars:228:in `block (2 levels) in run_thread'
    from script/convert-grammars:217:in `loop'
    from script/convert-grammars:217:in `block in run_thread'
    from /usr/lib/ruby/1.9.1/tmpdir.rb:83:in `mktmpdir'
    from script/convert-grammars:216:in `run_thread'
    from script/convert-grammars:269:in `block (2 levels) in main'

I guess I'm doing something wrong.

arfon commented 8 years ago
sh: 1: curl: not found
sh: 1: svn: not found
sh: 1: svn: not found
sh: 1: svn: not found
sh: 1: svn: not found
sh: 1: svn: not found
sh: 1: svn: not found

Looks like you need curl and svn installed on your local machine?

williamd1k0 commented 8 years ago

I installed curl and svn then script/convert-grammars finished but bundle exec rake test didn't. Now I just tried to push something into my fork and test with Travis.

Commit: https://github.com/williamd1k0/linguist/commit/bfa7eced4469926fe2fe967f32dedb0eeddb03fc Travis: https://travis-ci.org/williamd1k0/linguist/builds/111920286

I still don't know if this changes are enough.

williamd1k0 commented 8 years ago

I've been trying to understand how the strategy shebang works. If I understand right, the detection works with the "interpreter" of the language defined in languages.yml. If that's right, I think my changes are enough to solve it.

@arfon , can you confirm this for me, please? If this is ok, I'll open a pull request.

arfon commented 8 years ago

If that's right, I think my changes are enough to solve it.

Yep, that looks good. Please go ahead and open a Pull Request with this change.

arfon commented 8 years ago

Fixed in #2858