elki-project / elki

ELKI Data Mining Toolkit
https://elki-project.github.io/
GNU Affero General Public License v3.0
785 stars 323 forks source link

if -dbc.in does not exist, KDDCLIApplication will throw non-useful error #75

Closed bastian-wur closed 4 years ago

bastian-wur commented 4 years ago

Hi everyone,

this time reporting not a big issue. If I run ELKI as KDDCLIApplication, and if I get by accident the -dbc.in wrong, and the file does not exist, I will get as error

ERROR: The following configuration errors prevented execution:
Error instantiating internal class: elki.workflow.InputStep Path component should be '/'
The following parameters were not processed: [/home/bastian/data/nonexisting_file]
Stopping execution because of configuration errors above.

I think it would be nice if that could be caught by an useful error message :) (bc it took me a while to figure out what's wrong)

kno10 commented 4 years ago

Thank you, fixed.

javagl commented 4 years ago

This might not be worth an own issue, so ... when starting the MiniGUI and selecting or entering a path for the dbc.in parameter, it says

Error instantiating internal class: elki.workflow.InputStep Illegal character in opaque part at index 2: C:\Develop\elki\elki\elki-gui-minigui\iris.csv

(I could add the stack trace, but ... it was not printed... )

I tried the usual "fixes", replacing \ with /, adding the file protocol, but it didn't work.


OT: Actually, I wanted to have a closer look at the visualization capabilities of Elki, roughly related to the statements that ~"SVG is slow but nice for printing", and see whether this statement might become obsolete, considering things like SVGGraphics2D (also used in SvgGraphics ), and the importance of interactive visualizations in ML. But even after "pragmatically" doing

      //URI u = URI.create((String) obj);
      //return u.getScheme() != null ? u : Paths.get((String) obj).toUri();
      return Paths.get((String) obj).toUri();

at https://github.com/elki-project/elki/blob/cd867f52aae89e23b14c9a2bc39bb5a8ad8ac181/elki-core-util/src/main/java/elki/utilities/optionhandling/parameters/FileParameter.java#L108 "Run Task" still doesn't seem to do anything than printing the settings to the console. It's a pity.

kno10 commented 4 years ago

As I do not have Windows anymore since about 1998, I honestly do not know if the current URI-oriented code works on Windows. It mostly depends on how well Java works on Windows; but fixed to improve Windows support are welcome. Illegal character in opaque part sounds like a typical problem with Windows path names being kind of incompatible with URIs. But at the sime time, we need to support loading resources from the jar files, which is possible with URIs; so URIs are our best choice. As the paths handling is all pretty much in one location, it should be possible to get a workaround for windows along the lines of simply catching the URI parsing exceptions and trying a path fallback then. passing a proper file:/// URI should however work - make sure you encode all necessary characters. If you just want to use visualization, you'll still need to choose an algorithm, such as NullAlgorithm. The automatic visualization is only useful if the data is well-formed though. Visualizers currently will largely be limited to number vector data. But my guess is that you might not have the visualizer addon in your class path (because it outputs to console).

kno10 commented 4 years ago

So the fix for #75 apparently made things even worse, unfortunately. In my opinion, these are bugs in the Java NIO layer, but we have to work around them anyway.

kno10 commented 4 years ago

The new fix hopefully works better.

javagl commented 4 years ago

The Windows \-vs-/ issue was my first guess, but even when knowing that and manually fixing that and fiddling a file scheme into it: When using the file chooser to select the file, it will plainly not work.

(I didn't dive into the details of what you are doing with the FileSystem there - I never felt the necessity to use this part of the API. Usually, a Paths.get(string) should do the trick. But there seems to be quite some magic is happening for the parameter validation, maybe all this is in fact necessary....)

I'll give try and report back.

kno10 commented 4 years ago

A simple Paths.get does not work with resource files (files inside jar files). Which is why we need the support for URIs.

javagl commented 4 years ago

Sure, there's no need to argue about that (and not even about whether FileParameter (was a PathParameter and) now is a UriParameter - that's how things go... ;-) )

But the implications of such generalizations (like switching from Path to URI) may be tricky (and hard to test on a single OS...). Particularly, when manually entering the string

file:///C:/Develop/elki/elki/elki-gui-minigui/iris.csv

as the dbc.in value, then it is converted to

C:\Develop\elki\elki\elki-gui-minigui\iris.csv

and this, in turn, caused the "Illegal character..." issue later on. This change is almost certainly caused by FileParameter#getValueAsString, which is called at some point during the roundtrip through the ParameterTable, SerializedParametrization and TrackParameters infrastructure.

(The latter had caused some head-scratching for me recently: I'm using Elki as a library to wrap some of the DM/ML classes, and when instantiating them, I wrapped the ParameterTable into something that I called ElkiObjectDialog, where you can throw in some Elki Class<?>, show the ParameterTable and eventually obtain the Elki instance . The *Parametrization classes certainly underwent some iterations for refactorings...)

However, (literally) the bottom line is:

With the latest changes, it works (for me), and after adding the "addons", I could even see the Iris visualizations dashboard.