WhichBrowser / Parser-PHP

Browser sniffing gone too far — A useragent parser library for PHP
http://whichbrowser.net
MIT License
1.79k stars 240 forks source link

Proposal: Set all names to English non-extended Latin character set range to allow automatic translation templates #632

Closed summercms closed 3 years ago

summercms commented 3 years ago

@NielsLeenheer @mariotsi

Currently we are connecting this repo up to translation templates, that then automatically translate the names to the user's language setting. This creates a better end-user experience, below is a quick example:

English Yaml File:

'browserName' => [
    'Google Chrome' => 'Google Chrome',
    'Firefox' => 'Firefox',
    ..
]

Chinese Yaml File:

'browserName' => [
    'Google Chrome' => '谷歌浏览器',
    'Firefox' => '浏览器',
    ..
]

etc. for any language file.

The current issue is that there are two lines of code with different languages, see here:

https://github.com/WhichBrowser/Parser-PHP/blob/da24adc4f4f26002673d236e69b91a10f2fd594c/data/applications-browsers.php#L350

'name' => '冲浪浏览器',

Would be great to change it to it's English name of:

'name' => 'Surf Chinese browser',

Note: There is an English surf and Chinese surf browser - see https://github.com/WhichBrowser/Parser-PHP/pull/332

This would make the following three files all English:

https://github.com/WhichBrowser/Parser-PHP/blob/master/data/applications-bots.php

https://github.com/WhichBrowser/Parser-PHP/blob/master/data/applications-browsers.php

https://github.com/WhichBrowser/Parser-PHP/blob/master/data/applications-others.php

'SBM204SH' => [ 'Sharp', 'シンプルスマホ 204SH', 'carrier' => 'Softbank' ],

Would be:

'SBM204SH' => ['Sharp', 'Simple smartphone 204SH', ' carrier' =>'Softbank'],

Changing everything to English would give two main advantages:

  1. Allow automatic translation servicing to process the data.

  2. Make it easier and safer for developers to sanitize the data before saving it to say a database as an example.

With regards to security, allowing pretty much any character in this repo is a bit of a nightmare, there's nothing to stop bad actors from adding dirty user agents containing illegal characters, such as upside down characters, or corrupted characters such as this example:

Ṱ̺̺̕o͞ ̷i̲̬͇̪͙n̝̗͕v̟̜̘̦͟o̶̙̰̠kè͚̮̺̪̹̱̤ ̖t̝͕̳̣̻̪͞h̼͓̲̦̳̘̲e͇̣̰̦̬͎ ̢̼̻̱̘h͚͎͙̜̣̲ͅi̦̲̣̰̤v̻͍e̺̭̳̪̰-m̢iͅn̖̺̞̲̯̰d̵̼̟͙̩̼̘̳ ̞̥̱̳̭r̛̗̘e͙p͠r̼̞̻̭̗e̺̠̣͟s̘͇̳͍̝͉e͉̥̯̞̲͚̬͜ǹ̬͎͎̟̖͇̤t͍̬̤͓̼̭͘ͅi̪̱n͠g̴͉ ͏͉ͅc̬̟h͡a̫̻̯͘o̫̟̖͍̙̝͉s̗̦̲.̨̹͈̣
̡͓̞ͅI̗̘̦͝n͇͇͙v̮̫ok̲̫̙͈i̖͙̭̹̠̞n̡̻̮̣̺g̲͈͙̭͙̬͎ ̰t͔̦h̞̲e̢̤ ͍̬̲͖f̴̘͕̣è͖ẹ̥̩l͖͔͚i͓͚̦͠n͖͍̗͓̳̮g͍ ̨o͚̪͡f̘̣̬ ̖̘͖̟͙̮c҉͔̫͖͓͇͖ͅh̵̤̣͚͔á̗̼͕ͅo̼̣̥s̱͈̺̖̦̻͢.̛̖̞̠̫̰
̗̺͖̹̯͓Ṯ̤͍̥͇͈h̲́e͏͓̼̗̙̼̣͔ ͇̜̱̠͓͍ͅN͕͠e̗̱z̘̝̜̺͙p̤̺̹͍̯͚e̠̻̠͜r̨̤͍̺̖͔̖̖d̠̟̭̬̝͟i̦͖̩͓͔̤a̠̗̬͉̙n͚͜ ̻̞̰͚ͅh̵͉i̳̞v̢͇ḙ͎͟-҉̭̩̼͔m̤̭̫i͕͇̝̦n̗͙ḍ̟ ̯̲͕͞ǫ̟̯̰̲͙̻̝f ̪̰̰̗̖̭̘͘c̦͍̲̞͍̩̙ḥ͚a̮͎̟̙͜ơ̩̹͎s̤.̝̝ ҉Z̡̖̜͖̰̣͉̜a͖̰͙̬͡l̲̫̳͍̩g̡̟̼̱͚̞̬ͅo̗͜.̟
̦H̬̤̗̤͝e͜ ̜̥̝̻͍̟́w̕h̖̯͓o̝͙̖͎̱̮ ҉̺̙̞̟͈W̷̼̭a̺̪͍į͈͕̭͙̯̜t̶̼̮s̘͙͖̕ ̠̫̠B̻͍͙͉̳ͅe̵h̵̬͇̫͙i̹͓̳̳̮͎̫̕n͟d̴̪̜̖ ̰͉̩͇͙̲͞ͅT͖̼͓̪͢h͏͓̮̻e̬̝̟ͅ ̤̹̝W͙̞̝͔͇͝ͅa͏͓͔̹̼̣l̴͔̰̤̟͔ḽ̫.͕
Z̮̞̠͙͔ͅḀ̗̞͈̻̗Ḷ͙͎̯̹̞͓G̻O̭̗̮

Corrupt data going into this repo could help create a DDOS attack and screw up all the php preg_match by making the results take several seconds to process and then flooding the server with tons of requests!

Also if all the results outputted were from the non-extended latin character set only, then it would make life easier to sanitize the data and protect databases storing the data etc.

Look forward to the admins views on this matter.