NinoSkopac / PhpTikaWrapper

Simple PHP Wrapper for Apache Tika
60 stars 25 forks source link

Problem with UTF-8 filenames #20

Closed Abelkrown closed 4 years ago

Abelkrown commented 4 years ago

I try to get text from file with russian name like "Тест.docx" and get error: Exception in thread "main" java.net.MalformedURLException: no protocol: /.../web/????.docx at java.net.URL.<init>(URL.java:611) at java.net.URL.<init>(URL.java:508) at java.net.URL.<init>(URL.java:457) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:472) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)

PHP Version: 7.0+

NinoSkopac commented 4 years ago

ok

Abelkrown commented 4 years ago

And? Just close?

NinoSkopac commented 4 years ago

Yeah.

On Tue, 25 Feb 2020 at 13:01, Abelkrown notifications@github.com wrote:

And? Just close?

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/NinoSkopac/PhpTikaWrapper/issues/20?email_source=notifications&email_token=AANTPNH6XPMOQ3LILLAKPCDREUJC7A5CNFSM4K3GIYUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM33WSQ#issuecomment-590854986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANTPNDJI5DA25FZWYZTJZTREUJC7ANCNFSM4K3GIYUA .

Abelkrown commented 4 years ago

Ok. Why no resolving?

NinoSkopac commented 4 years ago

Because it’s easier for you to just rename the file using standard characters

Abelkrown commented 4 years ago

Meh. Ofc it's one of the way but what to do if it is no the way? Also it possible with temp file but it's not a good way.

NinoSkopac commented 4 years ago

You should always rename the uploaded file due to security reasons.