Homas / ioc2rpz

ioc2rpz is a place where threat intelligence meets DNS.
Apache License 2.0
105 stars 17 forks source link

REGEX support #40

Closed dmgeurts closed 2 years ago

dmgeurts commented 2 years ago

Which REGEX shorthand expressions are supported? I had initially used /d for digits and \t for tabs, but these aren't working. \d is easily replaced with [0-9] however how should one match a tab?

However, I'm now wondering if my issue isn't that, but something else. I can't seem to get ioc2rpz to read any entries from the list I've got. To test I removed the tabs from the text file to see if the REGEX I had failed to match lines with tabs in it. I then simplified the REGEX, but still I can't get ioc2rpz to accept any of the domains listed in the source.

Dec  9 17:09:19 rpz2 c7735b7e2425[1525827]: Source: "Belgium_Gambling_Commission_BL", size: 17.53/KB (17951), MD5: "b3d707807d134c9e5f4e65d23e94ba50" 
Dec  9 17:09:19 rpz2 c7735b7e2425[1525827]: Source: "Belgium_Gambling_Commission_BL", got 0 indicators, clean time 0 

A sample of the records as contained in a simple text file hosted on a nearby web server is given below. The date is a date as listed on the website I'm scraping these domains from, they are not expiry dates. Could it be that they are interpreted as such? What am I doing wrong?

bingoround.com         2012-02-16 00:00:00         # 1
myglobalgames.com         2012-02-16 00:00:00         # 2
titanpoker.com         2012-02-16 00:00:00         # 3
jackpotcity.com         2012-02-16 00:00:00         # 4
casino.com         2012-02-16 00:00:00         # 5

The regex I'm using to read these values: ^([A-Za-z0-9][A-Za-z0-9\-\._]+)\ *[12][\d-]{9}\ [\d:]{8}\ *#\ [0-9]+$

Homas commented 2 years ago

You can find the details here: http://erlang.org/documentation/doc-5.7.4/lib/stdlib-1.16.4/doc/html/re.html Don't miss a green note in the beginning. The syntax described in the "PERL LIKE REGULAR EXPRESSIONS SYNTAX" chapter.

dmgeurts commented 2 years ago

I still had 2 other \d references in there. This REGEX works: ^([A-Za-z0-9][A-Za-z0-9\-\._]+)\ +20[0-9\-]{8}\ [0-9:]{8}\ +#\ [0-9]+$

dmgeurts commented 10 months ago

It turns out that having tab characters in source files bit me again.

There were three sources affected, only one caused ioc2rpz to crash after restarting the docker image. I've added a cronjob to check these files for tabs and replace them with spaces, as I've not been able to work out how to match tabs with \t using erlang re in ioc2rpz.