[X] I searched the issues and found no similar issues.
Component
Other
Feature
Code files often have headers. These do not contain information relevant to LLMs, and may also contain PII. We want to build a new transform to remove this header information from code files. This transform should be built in such a way that it can work across 300+ programming languages. One possible way to do is that the transform takes as input as a configuration file with Programming language names and characters to used for commenting for that language. It should then identify the header information in various programming languages specified in the input configuration file and edit the files to remove the header information.
Search before asking
Component
Other
Feature
Code files often have headers. These do not contain information relevant to LLMs, and may also contain PII. We want to build a new transform to remove this header information from code files. This transform should be built in such a way that it can work across 300+ programming languages. One possible way to do is that the transform takes as input as a configuration file with Programming language names and characters to used for commenting for that language. It should then identify the header information in various programming languages specified in the input configuration file and edit the files to remove the header information.
Are you willing to submit a PR?