Fixed the Issue in `utis.py`

andrewtavis / kwx

BERT, LDA, and TFIDF based keyword extraction in Python

BSD 3-Clause "New" or "Revised" License

70 stars 10 forks source link

Fixed the Issue in `utis.py` #52

Open cicada0007 opened 1 year ago

cicada0007 commented 1 year ago

1 st changes

In this modified code, the spacy_version variable is used to store the version of the SpaCy library. Inside the loop, the code checks whether the SpaCy version is 3.0.0 or higher. If it is, the lemmatization is performed using nlp(combined_texts). Otherwise, for older versions, the lemmatization is done using nlp.tokenizer(combined_texts). This change accounts for potential differences in SpaCy versions and ensures the code can handle them correctly.
47 Edit spaCy

2 nd changes

Replaced the string slicing with data.endswith() to check the file extension.
Changed the error message to be raised as an exception using raise ValueError() instead of just calling ValueError().
Fixed the indentation of the function and the if-elif-else blocks.

andrewtavis commented 1 year ago

Generally these changes look ok, @cicada0007, but the tests are failing as you can see. I think that one of your changes isn't valid as 18 tests are now failing. Could you take a look at this and get back to me when you have an idea of what's wrong? Or would you prefer that I look into this?

Thanks again!

cicada0007 commented 1 year ago

hey @andrewtavis Thank you noticing the error I will look into it If any help needed to me I will ask you

andrewtavis commented 1 year ago

Thank you for checking this, @cicada0007!

cicada0007 commented 1 year ago

hi @andrewtavis
Can you help me with locating error so that i can fix it I tried but there are 2000 lines of code so you can help me with it

andrewtavis commented 1 year ago

@cicada0007, can you let me know what your Python version is? I'm thinking I'm going to update kwx to run on 3.8+ :) I might just merge this as the changes do look good, and then I can figure out the conflicts on my end.

cicada0007 commented 1 year ago

@andrewtavis My current version of python is Python 3.11.3

andrewtavis commented 1 year ago

Thank you, @cicada0007. I figured as much :) kwx being written years ago, it makes sense that some changes seem like they’re working for you and then aren’t in the repo. I’ll again merge this at some point soon and go through to update the package to work with higher Python versions. Thing is that the SpaCy check wouldn’t be needed as we’ll only be above a certain version 🤔 I’ll verify this and remove the check if so 😊

andrewtavis commented 1 year ago

If you have interest in helping with the update, then let me know. I did an initial update and test last night and to be expected there are lots of failures, but it seems like what needs to happen is that the test targets needs to be switched as the random seeds are behaving differently at this point :) So when we’re checking that the output of a model is a list of strings that’s always the same, it now needs to be different ones given that the random number generator behind the output is slightly different :)

cicada0007 commented 1 year ago

Ya that's correct

andrewtavis commented 1 year ago

Ok, @cicada0007 :) As I said I’ll merge this soon and try to figure the updates out.

andrewtavis / kwx

Fixed the Issue in `utis.py` #52

1 st changes

47 Edit spaCy

2 nd changes