ParticularMiner / sparse_dot_topn_for_blocks

It has the same interface as `sparse_dot_topn` but additionally allows an array to be passed which will be updated with the maximum number of nonzero elements of each row of the result matrix with values above the given lower bound. This is suitable for block-matrix multiplication. That's all!
Apache License 2.0
0 stars 9 forks source link

Python Version #1

Open marzooq-unbxd opened 2 years ago

marzooq-unbxd commented 2 years ago

I was using this package with string-grouper I encountered an error with Python 3.6.5 Although string-grouper requires>=3.7 (first release), I was able to manage with some work-around. The latest version of string-grouper doesnt seem to work no matter what with Python 3.6.5.I tried avoiding using topn(using only n_blocks[1]=1).Hence ,I have to deal with this package. I did not see a Python version requirement for sparse_dot_topn_for_blocks Thanks

ParticularMiner commented 2 years ago

Hi @marzooq-unbxd ,

what error did you encounter?

marzooq-unbxd commented 2 years ago

https://justpaste.it/77mvs

Have linked the error ,linked to PEP517 also The reason I was using this was due to the same OverflowError caused by sparse-dot-topn. I was thinking ,whether it is possible to use just sparse-dot-topn and try recursively concatenating the results for multiple blocks.(using pandas?)Apart from the time difference, would there be any difference?

ParticularMiner commented 2 years ago

@marzooq-unbxd

Looks like the .pxd files in sparse-dot-topn-for-blocks cannot be found by setuptools for some reason. Do you know why? The file MANIFEST.IN has the information necessary to find these files. Does python 3.6 not support this file?

sparse-dot-topn cannot handle multiple blocks hence the creation of sparse-dot-topn-for-blocks. These packages may be similar but they address different issues.

marzooq-unbxd commented 2 years ago

Sorry, I am not too familiar with Cython/C code.I have no idea.Also , is the setuptools requirement of 18 enough ?Why was it bumped to 42? I was thinking splitting left_series to 'n' left_series objects and iterating over them and passing each of them to match strings with a constant right series.The final result would concatenate the multiple matches dataframes.This gives the same result as sparse-dot-topn-for-blocks ,correct?

ParticularMiner commented 2 years ago

@marzooq-unbxd

Sure. You could do that. Though you might not get any acceleration from it. Acceleration has been observed only when the right Series is split.

ParticularMiner commented 2 years ago

@marzooq-unbxd

Also , is the setuptools requirement of 18 enough ?Why was it bumped to 42?

This change was effected by the developers of sparse-dot-topn, so I’m not entirely sure. They probably needed to do so so that the .pxd files could be found.

ParticularMiner commented 2 years ago

@marzooq-unbxd

Have you tried installing directly from GitHub?

pip install git+https://github.com/ParticularMiner/sparse_dot_topn_for_blocks.git@master
marzooq-unbxd commented 2 years ago

just to know, what difference would it make?

ParticularMiner commented 2 years ago

@marzooq-unbxd

not sure. Perhaps it might succeed where the usual command fails.

ParticularMiner commented 2 years ago

@marzooq-unbxd

Researching a bit further, it seems this is an issue older versions of cython have with .pxd files. See https://github.com/cython/cython/issues/2452

I would suggest upgrading to the latest version of cython that python 3.6 can support if you have not done so already.