Open ddutt opened 5 years ago
Seems like a reasonable feature. I don’t have any plans to work on it.
On Jun 18, 2019, at 12:09, Dinesh Dutt notifications@github.com wrote:
Hi,
I'm interested in IPNetwork as well. Is there any work being done on this already? If not, is there any interest in supporting it?
Thanks,
Dinesh
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Cool. Any recommendations on providing the pull request provided I get to doing it?
Dinesh
It seems best to create a new type called IPNetworkArray rather than fiddle with IPArray. Would you agree?
Yes, a separate ExtensionDtype and ExtensionArray makes the most sense.
Will the storage be essentially the same? An ndarray with a record dtype for the hi and lo bits?
I think you'll want to inherit from NumPyBackedExtensionArrayMixin: https://github.com/ContinuumIO/cyberpandas/blob/master/cyberpandas/base.py. I wrote that as a prototype for pandas, so it's a bit convoluted, but may be useful.
I suspect you find some duplicate functionality with IPArray. I wouldn't worry about deduplicating things too much at first.
I was planning to store the thing as a CIDR string,
Hmm I'm not sure. For performance on operations, IPArray stores elements as
a 128-bit integer, split across two columns of a NumPy array.
This gives us NumPy's fast performance for operations like isna()
,
is_ipv4()
, masking, etc. I'm not sure if you could do those kind of
operations on a string, (but again, I don't know much about networking so I
may be missing something).
On Tue, Jun 18, 2019 at 1:15 PM Dinesh Dutt notifications@github.com wrote:
I was planning to store the thing as a CIDR string, /prefixlen. Would that work? I haven't written an extensions to Pandas dtypes before.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ContinuumIO/cyberpandas/issues/40?email_source=notifications&email_token=AAKAOIQIOMUPPOOYGPMRLODP3EQ5LA5CNFSM4HZCEBW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX7QE4A#issuecomment-503251568, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIRQ747C2OIUWHIXKU3P3EQ5LANCNFSM4HZCEBWQ .
It would be convenient for a number of operations if the representation of the network could be fixed width. Packed variable length strings are likely to make things complicated.
So I have a prelim implementation working, but not sure of its efficiency. I stored the network as an IPv4Network or IPv6Network object itself instead of trying to do a string or any other fancy implementation. For things like is_ipv4, the IPArray approach is much faster and for things like is_multicast, is_link_local, the approach I've taken seems faster from a simple %timeit on a few things perspective.
I'd like some eyes on what I've done to see if there are changes to make the thing better. What do you suggest? I still need a lot of cleaning up if all looks good.
Dinesh
I've got df.a.astype('ipnetwork'), df.query('a.ipnet.is_link_local') etc. working
Dinesh
On Wed, Jun 19, 2019 at 2:39 PM Dinesh Dutt notifications@github.com wrote:
So I have a prelim implementation working, but not sure of its efficiency. I stored the network as an IPv4Network or IPv6Network object itself instead of trying to do a string or any other fancy implementation. For things like is_ipv4, the IPArray approach is much faster and for things like is_multicast, is_link_local, the approach I've taken seems faster from a simple %timeit on a few things perspective.
On IPArray many methods, including is_multicast and is_link_local, convert from the fast ndarray of integers to the slow ndarray of IPv4Address objects. So I would expect those to be roughly the same (a bit slower, since IPArray.is_link_local needs to do the conversion first). Some of these could likely be implemented without converting to objects first.
That said, as a prototype this is probably fine for now. You may just want
to make the .data
attribute private in case the internal representation
changes in the future.
I'd like some eyes on what I've done to see if there are changes to make
the thing better. What do you suggest? I still need a lot of cleaning up if all looks good.
A PR is probably best, though I likely won't have time to review until after the next distributed and pandas releases.
Dinesh
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ContinuumIO/cyberpandas/issues/40?email_source=notifications&email_token=AAKAOISPIP6IXGFB2ER3QY3P3KDPNA5CNFSM4HZCEBW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDCAZI#issuecomment-503717989, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIUVYWTPCMHHYDFKC6TP3KDPNANCNFSM4HZCEBWQ .
Thanks Tom. Since you won't have time for a bit, let me clean it up and ensure the data is private for now. I'll look at what I can do (maybe add a third byte which is the prefixlen to your ip address model) to make this more efficient. Is Cython a possibility here? I'll send a PR after I'm comfortable its cleaned up.
Does that make sense?
Dinesh
Sounds good.
Not sure about Cython, unless it's really necessary, since that complicates the build / packaging so much.
On Wed, Jun 19, 2019 at 3:00 PM Dinesh Dutt notifications@github.com wrote:
Thanks Tom. Since you won't have time for a bit, let me clean it up and ensure the data is private for now. I'll look at what I can do (maybe add a third byte which is the prefixlen to your ip address model) to make this more efficient. Is Cython a possibility here? I'll send a PR after I'm comfortable its cleaned up.
Does that make sense?
Dinesh
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ContinuumIO/cyberpandas/issues/40?email_source=notifications&email_token=AAKAOIWIVKX4IQUQL7NFYDTP3KF6DA5CNFSM4HZCEBW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDDWJI#issuecomment-503724837, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIVY2G22YMA5YSVGHL3P3KF6DANCNFSM4HZCEBWQ .
I've created a pull request. I've tried to follow the model you've followed for IPArray in terms of where the different functions live. I've also added support for max/min besides support for supernet_of and subnet_of.
Hello, is there any progress on this issue?
@ddutt , @TomAugspurger did this go anywhere? I have a use-case for this and was curious if it's still "a thing"
Not really, I'm don't have time to work on cyberpandas these days.
I don't have time to work on cyberpandas either. I found other ways to work around it. But if you have some code, I'm happy to test it.
Hi,
I'm interested in IPNetwork as well. Is there any work being done on this already? If not, is there any interest in supporting it?
Thanks,
Dinesh