codezonediitj / pydatastructs

A python package for data structures and algorithms
https://pydatastructs.readthedocs.io/en/stable/
Other
199 stars 269 forks source link

Adding measures of sortedness/disorder in arrays #448

Open HarsheetKakar opened 2 years ago

HarsheetKakar commented 2 years ago

We should provide a function that should take an array as input and suggest one of the pydatastructs APIs as output. The time complexity should be O(n) because anything more won't be worth it. The constant in O(n) should also be very small (we can benchmark for that).

References

.. [1] https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=5F6D0F8B0B37F0A66AAD3100BD0459E7?doi=10.1.1.45.8017&rep=rep1&type=pdf

HarsheetKakar commented 2 years ago

@czgdp1807 feel free to edit the discription. As far as the O(n) complexity we are talking about, is this for checking the "sortedness" or sorting the array. As for the latter we will need to put constraints on the nature of elements passed in the array (e.g. I dont think we will be able to promise O(n) in case of floating point numbers IMHO).

And as far as checking the sortedness of the array, I was thinking of going for a probabilistic algorithm (some of which can go as low as O(log(n)/e) where e is the "distance" from sorted array).

czgdp1807 commented 2 years ago

As far as the O(n) complexity we are talking about, is this for checking the "sortedness" or sorting the array

I am referring to the sortedness.On the basis of the sortedness, we should suggest the sorting algorithm from pydatastructs.

I was thinking of going for a probabilistic algorithm (some of which can go as low as O(log(n)/e) where e is the "distance" from sorted array).

Is the any research work that already explores this?

N.B. https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=5F6D0F8B0B37F0A66AAD3100BD0459E7?doi=10.1.1.45.8017&rep=rep1&type=pdf

HarsheetKakar commented 2 years ago

Is the any research work that already explores this?

Yes Hamming Distance

czgdp1807 commented 2 years ago

In the paper in my above comment, section I.2 lists 10 measures of disorder for arrays. I feel that all these are worth adding into pydatastructs.

mridul45 commented 1 year ago

As a GSSOC 23 member , I request you to assign issue to me.