TutteInstitute / toponymy

MIT License
20 stars 6 forks source link

Possible Typo on the get_cluster_label_vector (line 100) #13

Open zackaryleady opened 3 days ago

zackaryleady commented 3 days ago

I am trying to use this incredible new library you are working on. And following your example in the arxiv notebook I received this error: image

It appears that an argument maybe missing (possibly a difference between HDBSCAN and the fast_hdbscan libraries?

From fast_hdbscan cluster_trees.py it looks like the function needs 4 inputs, but only 3 are provided. @numba.njit() def get_cluster_label_vector( tree, clusters, cluster_selection_epsilon, n_samples, ):

If you could provide a reasonable value for cluster_selection_epsilon and/or n_samples.

My Code if it helps:

image

lmcinnes commented 3 days ago

It is an internal method of fast_hdbscan that got updated; I believe toponymy has updates, but potentially they haven't been pushed to github yet. This library is under pretty active development right now, and for the moment you should expect random breakages to happen. Hopefully we'll get it to more stable state soon and make an actual release to PyPI that people can rely on a little more.

zackaryleady commented 2 days ago

Hello,

I just wanted to say that the latest fix by John resolved the issue. I did need to wrap my llm in the LlamaCPPWrapper() because otherwise there is an issue with the llm_instruction() not being found. Once I used the LlamaCPPWrapper() I also had to remove the encode("utf-8") in my own code, because the Wrapper does this internally. If you think it would be useful I could write down some helpful hints on all the debugging I had to do to get it working for future users, just let me know.

~Zack

lmcinnes commented 2 days ago

We are moving away from the LLamaCPP because the online services are easier and fairly popular, but if you have fixes that makes the LlamaCPPWrapper work better that would be greatly appreciated.