Add rule function for IP addresses, only supports IPv4 for now

kemasdimas commented 2 years ago

Referring to #45

kemasdimas commented 2 years ago

Ergonomics Could it be ergonomic and flexible to leverage the existing composability of the existing functions, instead of having multiple match variants? For instance:

contains(cidr("10.0.0.0/28"), ip("10.0.0.16"))

lt(ip(env["user.ip"], ip("109.71.47.255"))

not(ip(env["user.ip"], ip("127.0.0.1"))

Of course ip and cidr would have to return something. Long and Range could be good choices, but this is related to the next point.

I agree, I haven't put more thought into this, and creating the rule to be composable seems to be a lot better. So there will be 2 new function

cidr -> return Range ip -> return Long

Performance The current proposal relies on a non-trivial regular expression that we can't pre-compile, which can be suboptimal considering FFS is designed to do thousands of evaluations per second. But IP addresses are just numbers, and IP+mask are just ranges. We could treat them like that?

In this case, the regex is redundant, it was only used to validate the IP / CIDR format. Referring to the Ergonomics, treating IP addresses and mask as numbers are to be expected.

Specifically, what if we:

Add a range type (with tests & docs). It seems useful, regardless of IPs.

Implement ip(value) as a function that takes an IP, and returns a Long. An IP is just a group of 4 bytes, so we could parse it (example) and combine it together in a single number.

Implement cidr(value) as a function that takes an IP+mast, and returns a range. It can reuse the range and ip(value) implementations from before.

This could be very efficient. What do you think?

Yes, this will be efficient to run. My only concern is the IPv6 format. It's not straightforward to convert the IP to number, even using ULong, so we'll need to find a way to represent the IPv6 in 1 single variable while keeping it comparable to the CIDR

For the Alpha milestone, will it be enough if we only support IPv4?

goncalossilva commented 2 years ago

128-bit can represent IPv6, so if this strategy works well, we can consider using a List with two Long, or representing IPs using byte arrays, os similar. But one challenge at a time, IPv4 for now sounds good! 😄

kemasdimas commented 2 years ago

First thing, thank you for taking your time to review my code, and really appreciate all your detailed writings, I learned a lot of new things and processes from this codebase 😊👍

Summary

Implement a Range function based on https://gist.github.com/goncalossilva/2406eb278eb0c8a1b89c3db76f4e0e2b, with a small tweak to allow the range to use a function, e.g. contains([datetime("2021-1-1"):datetime("2022-10-1")], now())
Create IP to decimal converter function that can be used in rule composition
Create a CIDR range function, it will return a range of min and max IP addresses, works well with rule composition
Add test and docs for these 3 features

IP Converter Implementation IP addresses (string) is converted to their decimal format, using a simple calculation, it should be very efficient to run in production

Example composition gt(ip("192.168.1.0"), ip("192.168.0.255")) will return 1f

CIDR implementation Based on https://stackoverflow.com/questions/62124452/converting-cidr-notation-into-ip-range-without-other-library

The implementation returns a Range of Long, using a similar boilerplate as the range function contains([min:max], value), so it can be strung together nicely in rule composition

Example composition not(contains(cidr("254.200.222.210/23"), ip("254.200.223.255"))) will return 0f

kemasdimas commented 2 years ago

Ooops, accidentally closed this pull request when cleaning up my repo fork. 😓

I will create a squashed commit that incorporates @goncalossilva recent comments 😊

Doist / ffs

Add rule function for IP addresses, only supports IPv4 for now #59