Closed dmgeurts closed 2 years ago
You can find the details here: http://erlang.org/documentation/doc-5.7.4/lib/stdlib-1.16.4/doc/html/re.html Don't miss a green note in the beginning. The syntax described in the "PERL LIKE REGULAR EXPRESSIONS SYNTAX" chapter.
I still had 2 other \d references in there. This REGEX works: ^([A-Za-z0-9][A-Za-z0-9\-\._]+)\ +20[0-9\-]{8}\ [0-9:]{8}\ +#\ [0-9]+$
It turns out that having tab characters in source files bit me again.
There were three sources affected, only one caused ioc2rpz to crash after restarting the docker image. I've added a cronjob to check these files for tabs and replace them with spaces, as I've not been able to work out how to match tabs with \t
using erlang re in ioc2rpz.
Which REGEX shorthand expressions are supported? I had initially used /d for digits and \t for tabs, but these aren't working. \d is easily replaced with [0-9] however how should one match a tab?
However, I'm now wondering if my issue isn't that, but something else. I can't seem to get ioc2rpz to read any entries from the list I've got. To test I removed the tabs from the text file to see if the REGEX I had failed to match lines with tabs in it. I then simplified the REGEX, but still I can't get ioc2rpz to accept any of the domains listed in the source.
A sample of the records as contained in a simple text file hosted on a nearby web server is given below. The date is a date as listed on the website I'm scraping these domains from, they are not expiry dates. Could it be that they are interpreted as such? What am I doing wrong?
The regex I'm using to read these values:
^([A-Za-z0-9][A-Za-z0-9\-\._]+)\ *[12][\d-]{9}\ [\d:]{8}\ *#\ [0-9]+$