a-tikhomirov / Linux

Some linux practice
0 stars 0 forks source link

lesson 4 improvements #6

Open mtuktarov opened 4 years ago

mtuktarov commented 4 years ago

Hi I created some tests for the lesson 4 homework and your regexes have minor issues:

Task 1 regex matches data below, however, it should not:

006.66.189.184
06.66.189.184
000.66.189.184
95.034.255.40
111.05.218.217
111.005.218.217
161.148.053.172
161.148.003.172
161.148.03.172
161.148.153.072
161.148.153.02
161.148.153.002

Task 4 Using the input data below

http://site.com/1.png.bin
https://site.com.ru/some-path/2.bmp.exe
www.site-site.more.org/dir1/dir2/3.jpeg.sh
https://www.another_site.ru/4.bin.png
bit.ly/00a0sa0sasaxsdd.png
http://bit.ly/00a0sa0sasaxsdd.docx
https://bit.ly/00a0sa0sasaxsdd.gif.not
http://www.terra.es/asasa.bin.jpg
hhttps://www.terra.es.net/asasa.gif
https:///www.terra.es.com/dir/dir/picture.jpg
https://www.terra2.es.com/
http://example.com/admin/login/?next=/admin/widgets.exe5
http://example.com/admin/login/?next=/admin/animation.jpg
http://example.com/admin/login/?next=/admin/sites/site/change_form.jpg
http://example.com/admin/login/?next=/admin/sites/site/core.jpg
http://example.com/static/admin/img/Open+Sans_400_normal.gif
http://example.com/static/admin/img/Open+Sans_700_normal.gif3
http://example.com/static/admin/js/Open+Sans_700_normal.giff
https://example.com/admin/login/?next=/admin/search.phpr
https://example.com/admin/login/?next=/admin/sorting-icons.php
https://example.com/admin/login/?next=/admin/sites/site/tooltag-add.php
https://example.com/admin/login/?next=/admin/sites/site/widgets.php
https://example.com/Open+Sans_400_normal.gadget
https://example.com/login/Open+Sans_700_normal.gadget3
https://example.com/login/Open+Sans_700_normal.gadgetf

command works this way:

$ grep -P '(?<!\S)(https?:\/\/)?([\w-]+\.)+[a-z]+(\/[\w.-]+)*\/[\w.-]+\.\w+(?<!\.exe|bin|sh)(?!\S)' out.txt 
https://www.another_site.ru/4.bin.png
bit.ly/00a0sa0sasaxsdd.png
http://bit.ly/00a0sa0sasaxsdd.docx
https://bit.ly/00a0sa0sasaxsdd.gif.not
http://www.terra.es/asasa.bin.jpg

However I expect following:

https://www.another_site.ru/4.bin.png
http://bit.ly/00a0sa0sasaxsdd.docx
https://bit.ly/00a0sa0sasaxsdd.gif.not
http://www.terra.es/asasa.bin.jpg
http://example.com/admin/login/?next=/admin/widgets.exe5
http://example.com/admin/login/?next=/admin/animation.jpg
http://example.com/admin/login/?next=/admin/sites/site/change_form.jpg
http://example.com/admin/login/?next=/admin/sites/site/core.jpg
http://example.com/static/admin/img/Open+Sans_400_normal.gif
http://example.com/static/admin/img/Open+Sans_700_normal.gif3
http://example.com/static/admin/js/Open+Sans_700_normal.giff
https://example.com/admin/login/?next=/admin/search.phpr
https://example.com/admin/login/?next=/admin/sorting-icons.php
https://example.com/admin/login/?next=/admin/sites/site/tooltag-add.php
https://example.com/admin/login/?next=/admin/sites/site/widgets.php
https://example.com/Open+Sans_400_normal.gadget
https://example.com/login/Open+Sans_700_normal.gadget3
https://example.com/login/Open+Sans_700_normal.gadgetf
a-tikhomirov commented 4 years ago

Hello! Thank you for the comment.

For task 1 - I expected the regex to work like that because I thought that 006 oktet is equal to 06 and equal to 6 But if we want to exclude such results, the oktet regex will be: (25[0-5]|2[0-4][0-9]|1?[1-9]?[0-9])

Added commit for the lesson 4 task 1: Lesson 4 task 1 commit

For task 4 - sorry, my bad. Added = and ? symbols to the url path regex and added + symbol to the filename: (?<!\S)(https?:\/\/)?([\w-]+\.)+[a-z]+(\/[\w.=?-]+)*\/[\w.+-]+\.\w+(?<!\.exe|bin|sh)(?!\S)

Added commit for the lesson 4 task 4: Lesson 4 task 4 commit