WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
929 stars 122 forks source link

Support for duplicate keys #43

Closed richjames0 closed 5 years ago

richjames0 commented 8 years ago

It seems that when a key is added a second time, a duplicate is not added and rather the value associated with that key is replaced. It would be very useful for my use case and I'm sure others to store both, and I've seen this in other implementations.

I could handle this up front but it means spending quite a bit of time in Python and so would be slow. I'd be willing to help with a change, if people generally agreed it would be positive, but might need some pointers.

Thanks for an awesome, stable and blazingly fast library!

pombredanne commented 8 years ago

@richdutton Hi :)

The behaviour here is the same as for a dict: associating a value to an existing key will replace the value.

It would be very useful for my use case and I'm sure others to store both, and I've seen this in other implementations.

Do you have pointers to this?

I could handle this up front but it means spending quite a bit of time in Python and so would be slow.

Are you sure this would slow? IMHO you could try this: Always assign a list object as a value. When you insert, first test if the key exists, and if yes, append to the list; otherwise add the key/value wrapping the value in list. I am not sure this would major performance implications

Thanks for an awesome, stable and blazingly fast library! All the credits go to @WojciechMula

WojciechMula commented 8 years ago

@richdutton: as @pombredanne wrote, you could try with standard python containers (list, tuple, set). If performance appear to be issue, we will consider changes.

Thanks for the kind words, but without help received from many people, the lib wouldn't be as good as it is now.