github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
11.95k stars 4.14k forks source link

Jinja templates are detected as HTML files #5265

Closed starcraft66 closed 3 years ago

starcraft66 commented 3 years ago

Describe the bug

Github is marking my repository that doesn't contain a single line of HTML as containing 75% HTML. I ran linguist on the repo locally and according to the output, it is considering all files with the .j2 extension as HTML files:

HTML:
roles/tdude.core/templates/issue.j2
roles/tdude.elk/templates/docker-compose.yml.j2
roles/tdude.elk/templates/elasticsearch.yml.j2
roles/tdude.elk/templates/journalbeat.yml.j2
roles/tdude.elk/templates/kibana.yml.j2
roles/tdude.elk/templates/logstash.yml.j2
roles/tdude.elk/templates/pipelines/cisco-syslog.conf.j2
roles/tdude.elk/templates/pipelines/journalbeat.conf.j2
roles/tdude.elk/templates/pipelines/lancache.conf.j2
roles/tdude.elk/templates/pipelines/output.conf.j2
roles/tdude.elk/templates/pipelines/syslog.conf.j2
roles/tdude.gitlab/templates/docker-compose.yml.j2
roles/tdude.kerio/templates/docker-compose.yml.j2
roles/tdude.lolisafe/templates/config.js.j2
roles/tdude.lolisafe/templates/docker-compose.yml.j2
roles/tdude.matrix/templates/coturn/coturn.service.j2
roles/tdude.matrix/templates/coturn/turnserver.conf.j2
roles/tdude.matrix/templates/docker-compose.yml.j2
roles/tdude.matrix/templates/element/config.json.j2
roles/tdude.matrix/templates/matrix-appservice-webhooks/appservice-registration-webhooks.yaml.j2
roles/tdude.matrix/templates/matrix-appservice-webhooks/config.yml.j2
roles/tdude.matrix/templates/matrix-appservice-webhooks/database.json.j2
roles/tdude.matrix/templates/matrix-media-repo/media-repo.yaml.j2
roles/tdude.matrix/templates/mjolnir/mjolnir.yaml.j2
roles/tdude.matrix/templates/nginx/matrix.conf.j2
roles/tdude.matrix/templates/synapse/config.logging.j2
roles/tdude.mediaserver/templates/docker-compose.yml.j2
roles/tdude.mediaserver/templates/jackett.j2
roles/tdude.mediaserver/templates/radarr.j2
roles/tdude.mediaserver/templates/sonarr.j2
roles/tdude.mediaserver/templates/tautulli.j2
roles/tdude.mediaserver/templates/varken.j2
roles/tdude.minecraft-discord-bridge/templates/config.json.j2
roles/tdude.minecraft-discord-bridge/templates/docker-compose.yml.j2
roles/tdude.monitoring/templates/docker-compose.yml.j2
roles/tdude.monitoring/templates/prometheus-node-exporter.j2
roles/tdude.monitoring/templates/prometheus.yml.j2
roles/tdude.ndppd/templates/ndppd.conf.j2
roles/tdude.statusfy/templates/docker-compose.yml.j2
roles/tdude.traefik/templates/docker-compose.yml.j2
roles/tdude.traefik/templates/http.toml.j2
roles/tdude.traefik/templates/traefik.toml.j2
roles/tdude.wireguard/templates/interface.conf.j2

This is obviously not the case as all of the .j2 files in my repo are templates yaml, json and other configuration files.

Expected behaviour

Jinja templates should be classified as "Jinja templates" or something more descriptive that isn't HTML. Alternatively, linguist could drill down further in the file name and classify .html.j2 files as HTML, .yaml.j2 files as YAML, .json.j2 files as JSON etc.

Related discussion

@lildude pointed out in https://github.com/github/linguist/issues/5047 that jinja2 templates "are part of the "HTML+Django" language which is part of the HTML group of languages". This is obviously a bug, there's nothing about files ending in .j2 that makes them "HTML+Django" specific. In my case, Ansible roles in my repo make significant use of jinja2 that has nothing to do with HTML or Django.

Nixinova commented 3 years ago

Similar process also with HTML+Django: #5167