dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.45k stars 4.76k forks source link

`codepages.nlp` binary in `System.Text.Encoding.CodePages` #81693

Open premun opened 1 year ago

premun commented 1 year ago

Context

The key goal of source-build is to satisfy the official packaging rules of commonly used Linux distributions, such as Fedora and Debian. Many Linux distributions have similar rules. These rules tend to have two main principles: consistent reproducibility, and source code for everything.

In order to support the "source code for everything" requirement, binary files are not allowed in product repositories. Aside from, binaries that can be created during the build process from source are better not to be checked in as one of the main goals of git is that humans can review the code changes.

Questions

Goal

We should comply with the source build requirements and get rid of these binaries. The file in question is https://github.com/dotnet/runtime/blob/main/src/libraries/System.Text.Encoding.CodePages/src/Data/codepages.nlp

Based on the discussion here, it seems it's possible to synthesize this file from source but the current tool that does that is written in Perl.

Possible workarounds

At the moment, we only source-build Linux x64/arm64 so if this file is required for other RIDs, it can be temporarily removed from the source build. This is only in case it's difficult to replace the file with source. Other platforms will be supported by source build in the future though, so this problem will re-surface in case we go around it this way.

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-text-encoding See info in area-owners.md if you want to be subscribed.

Issue Details
### Context The key goal of source-build is to satisfy the official packaging rules of commonly used Linux distributions, such as [Fedora](https://fedoraproject.org/wiki/Packaging:Guidelines) and [Debian](https://www.debian.org/doc/manuals/maint-guide/build.en.html). Many Linux distributions have similar rules. These rules tend to have two main principles: consistent reproducibility, and source code for everything. In order to support the "source code for everything" requirement, binary files are not allowed in product repositories. Aside from, binaries that can be created during the build process from source are better not to be checked in as one of the main goals of git is that humans can review the code changes. ### Questions - What scenario / which RID are these files used for? - Are these files necessary for a successful build of the .NET SDK? - If they are is, can they be removed from the repository and replaced with a source and process that synthesizes them during build? ### Goal We should comply with the source build requirements and get rid of these binaries. The file in question is https://github.com/dotnet/runtime/blob/main/src/libraries/System.Text.Encoding.CodePages/src/Data/codepages.nlp Based on the discussion [here](https://teams.microsoft.com/l/message/19:977f68c19ca2422db22072560f93ae27@thread.skype/1675076942682?tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47&groupId=014ca51d-be57-47fa-9628-a15efcc3c376&parentMessageId=1675076942682&teamName=dotnet%2Fruntime%20repo&channelName=General&createdTime=1675076942682), it seems it's possible to synthesize this file from source but the current tool that does that is written in Perl. ### Possible workarounds At the moment, we only source-build Linux x64/arm64 so if this file is required for other RIDs, it can be temporarily removed from the source build. This is only in case it's difficult to replace the file with source. Other platforms will be supported by source build in the future though, so this problem will re-surface in case we go around it this way.
Author: premun
Assignees: -
Labels: `area-System.Text.Encoding`, `untriaged`
Milestone: -
tarekgh commented 1 year ago

Some info about codepages.nlp: