github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.23k stars 4.23k forks source link

Request: Add XML Heuristic for *proj$ File Pattern #7090

Closed kasperk81 closed 1 hour ago

kasperk81 commented 2 hours ago

query: https://github.com/search?q=path:*.*proj+language:xml&type=code

Microsoft has used XML as the format for project files for decades, with many project files having the proj suffix (e.g., .csproj, .vbproj, .fsproj, .pyproj, .jsproj, .sfproj and couple of dozen others). These files follow a consistent XML structure, but listing every project file extension individually can be cumbersome.

Suggestion:
To streamline the classification and ensure proper syntax highlighting for these files, it would make sense to add a heuristic rule to detect and classify files ending with proj as XML. Specifically, applying an XML heuristic to files that match the regex pattern ^*.*proj$ would allow these files to be recognized and highlighted as XML automatically, without the need to list each extension separately.

This improvement would cover a wide variety of project file extensions and enhance the developer experience on GitHub.

lildude commented 2 hours ago

This is not necessary. All standard XML files are already correctly detected as such, unless there is a specific language entry for them thanks to the XML strategy:

https://github.com/github-linguist/linguist/blob/f164d13fa618023ecf2d8f2ed9a6ce5fae731346/lib/linguist/strategy/xml.rb#L26

The strategies are applied in this order:

https://github.com/github-linguist/linguist/blob/f164d13fa618023ecf2d8f2ed9a6ce5fae731346/lib/linguist.rb#L62-L71

You can tell this is already working as expected as all of the files in the seach results are already identified as XML (hence you can use the language:xml qualifier) and all have syntax highlighting.

This raises the question: why do you think we need this when it's already the case? Where are you seeing this not taking effect?

kasperk81 commented 2 hours ago

Good point. The issue is that these proj files don’t always include the <?xml> preamble, which is why they aren't automatically recognized as XML. This makes it necessary to explicitly capture the extensions in Linguist's YAML configuration.

lildude commented 2 hours ago

Oh wait, I think I understand what you're getting at.

No. We'd prefer to add support for extensions on a one-by-one basis as we require usage levels to be met before support can be added.

kasperk81 commented 1 hour ago

Do you know how is fsproj or csproj getting recognized? e.g. https://github.com/blowdart/AspNetAuthenticationWorkshop/blob/fbe24ffd430bf38855e10e7e9344083e038cc269/src/Step6/Step_6_Adding_Mvc.csproj doesn't have the preamble and with ```csproj

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    <TargetFramework>netcoreapp2.1</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.AspNetCore.App" />
  </ItemGroup>

</Project>
kasperk81 commented 1 hour ago

Reason why I'm asking is file starting with <Solution is not recognized (tests are failing in https://github.com/github-linguist/linguist/pull/7084)

lildude commented 1 hour ago

Yup, they're explicitly listed under XML...

https://github.com/github-linguist/linguist/blob/f164d13fa618023ecf2d8f2ed9a6ce5fae731346/lib/linguist/languages.yml#L8029

https://github.com/github-linguist/linguist/blob/f164d13fa618023ecf2d8f2ed9a6ce5fae731346/lib/linguist/languages.yml#L8038

kasperk81 commented 1 hour ago

Yes, it didn't recognized slnx that way though.

lildude commented 1 hour ago

Explaination: https://github.com/github-linguist/linguist/pull/7084#issuecomment-2411845740

kasperk81 commented 1 hour ago

Thanks. Lets close this one.