MagedSaeed / farasapy

A Python implementation of Farasa toolkit
MIT License
112 stars 21 forks source link

regex for checking java version isn't supported for Java SE #10

Closed pranavsdev closed 3 years ago

pranavsdev commented 4 years ago

Issue:

I get below error when I initialise FarasaSegmenter

File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/farasa/__base.py", line 149, in _check_java_version re.search(version_pattern, version_proc_output).groups()[0] AttributeError: 'NoneType' object has no attribute 'groups'

Below are the details for java version on my machine.

$ java -version java version "13" 2019-09-17 Java(TM) SE Runtime Environment (build 13+33) Java HotSpot(TM) 64-Bit Server VM (build 13+33, mixed mode, sharing)

Possible cause: Seems that regular expression is not handling Java SE. It supports only jdk Locally I am able to run it using version_pattern = r'\"(.+?)\"'

MagedSaeed commented 3 years ago

Thanks for mentioning that @greenshrek

Regarding your solution, I think r"'(.+?)"' pattern is more than what is required? what about r"\"(\d+)*\"? This will check there is at least a version digits. What do you think?

One more issue that needs to be implemented is this should not stop the flow. Instead, the exception should be caught and returned with a proper error message.

pranavsdev commented 3 years ago

Hi @MagedSaeed

1) regex r"\"(\d+)*\" will work fine if there is only a major version involved. However, it fails where there is a minor version and a patch also in version_proc_output (example - 13.01.01). Attached is the code for reference (check.py)

check.py.zip

2) Yes, agree. The flow should not be stopped.

avriiil commented 3 years ago

Hi @MagedSaeed and @pranavsdev - has this been solved? I'm running into the same issue with:

java -version java version "16" 2021-03-16 Java(TM) SE Runtime Environment (build 16+36-2231) Java HotSpot(TM) 64-Bit Server VM (build 16+36-2231, mixed mode, sharing)

Working on MacOS Catalina (10.15.4) and working in a Jupyter Notebook

MagedSaeed commented 3 years ago

Hey @pranavsdev ,

Thanks for reporting such issue. Just give me some time. I will try to fix it as soon as possible.

avriiil commented 3 years ago

Great, thank you. Would you please notify me when you've fixed it so I know when I can continue working?

Best,

-- Richard Pelgrim +31 (0)6 374 807 31

On Thu, Apr 1, 2021 at 4:56 AM Maged Saeed @.***> wrote:

Hey @pranavsdev https://github.com/pranavsdev ,

Thanks for reporting such issue. Just give me some time. I will try to fix it as soon as possible.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MagedSaeed/farasapy/issues/10#issuecomment-811856327, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQLWMSWO7JURAZR2D6WJNWLTGRNPFANCNFSM4TV3D7TQ .

MagedSaeed commented 3 years ago

@rrpelgrim I am currently working on it. The cause of the problem is that the regex will match any java version with this pattern "d.d+" However, I was not expecting to receive patterns like "d+".

I had tested this pattern

version_pattern = r"\"(\d+(\.\d+){0,1})"

and seemed to work just fine.

@pranavsdev , I just found out that you replied to this issue with the code. Thanks for your effort. I need your thoughts both here on the proposed regex. Regarding the one you proposed, I do not feel it is restrictive. I feel this is more restrictive.

MagedSaeed commented 3 years ago

If you both agreed to this regex, I will update the code and push the changes to both, the GitHub repo and PyPi.

pranavsdev commented 3 years ago

@MagedSaeed

The regex you mentioned is not able to match a pattern like - 13.01.01

Instead, version_pattern = r"(\d+(.\d+){0,1})" works fine for me. It is able to match 13 or 13.01 or 13.01.01

I used https://regex101.com/ to test. Please check the attachments. Let me know what you think

3 2 1

MagedSaeed commented 3 years ago

Actually, this should not work. Because you are missing the " character at the beginning. I noticed that every version starts with ". So, I want to only match the version exactly excluding any other possibility for other numbers. I only matched two tokens split by '.'. This is because I want to compare it with 1.7.

image

Yours will accept any character between numbers including '_'. Because you are not escaping the '.'.

Hope you got may point!

pranavsdev commented 3 years ago

Ah, i missed that. You are right, it should work!!

ALahssini commented 3 years ago

Hi @MagedSaeed and @pranavsdev - has this been solved?

I'm trying to use Farasa segmenter on openjdk docker image with java SE.

Thank you in advance for your answer !

MagedSaeed commented 3 years ago

Hey @ALahssini and @rrpelgrim

I fixed the problem locally but figured out that I did not deploy to PyPI. I just pushed changes to GitHub and also deployed to PyPI. Please uninstall your current version, I guess 0.0.12, and install the latest one, 0.0.13 by :

pip install -U farasapy

Let me know if the problem did not resolve,.

Maged,.

MagedSaeed commented 3 years ago

Please let me know so that I will close this issue.

MagedSaeed commented 3 years ago

@ALahssini and @rrpelgrim

Any updates please, if you do not have any comments, I will close this issue.