biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.8k stars 1k forks source link

improving "Test & Scores" Widget #3233

Closed flave95 closed 6 years ago

flave95 commented 6 years ago
PrimozGodec commented 6 years ago

Can you please provide some description of what improvement do you want?

flave95 commented 6 years ago

oh I kinda messed it up. My description was: I work on a new function for the “Test & Scores” widget, which provides more accurate prediction scores. The problem I am facing is that I hardly ever worked with orange on such a detailed basis and although I already worked out the main part of the algorithm, I don’t know how to implement it.

Could you provide me with an advice how it should be done? Or at least how I can copy the whole “Test & Scores” widget and get it working (to see which parts have to be changed in order to implement my new function)?

The Improvement would be an more independent validation of the test_data. The problem with the already exsisting Tests, as the "Cross Validation" or the "Leave One out" (which mathematically is CV but just with N-Folds) is that they both test participants, which preaviously have been in the training set. In psychology its a real problem and a violation of test criterions for predictability.

I wirte my bachelor thesis on this topic and would love to solve this problem with Orange Canvas

astaric commented 6 years ago

If you have your patient as a column in your data, you might be able to achieve this already.

If you move the patient variable to metas using Select Columns widget, you should be able to select the cross validation by feature in Test & Score. This option should ensure that the same patient is only used in training part or testing part of the data (and not both).

flave95 commented 6 years ago

Yes you are right. But the issue is that the whole given predictability is statisctically not valid and independent. I did already a lot of work and scripted the "class" but i just dont know what steps i have to do to paste my function as a subclass in the Test & Score widget

astaric commented 6 years ago

Probably the easiest way would be to start from the example add-on: https://github.com/biolab/orange3-example-addon (get the code from github, run pip install . in the top level directory of the addon)

then copy the Test & Score widget file (https://github.com/biolab/orange3/blob/master/Orange/widgets/evaluate/owtestlearners.py) to the add-on (to folder orangecontrib/example/widgets)

run orange, if everything went according to the plan, you will see a new category ("My Category") with a copy of the Test & Score widget - you can change the copy by modifying the file you added to the add-on

flave95 commented 6 years ago

failing for hours now to get the add on running -_-' .

"Command "python setup.py egg_info" failed with error code 1 in C:\Users\Admin\AppData\Local\Temp\pip-build-_qceabf0\bottleneck"

It´s always the same issue. I tried it now on 2 Computers and both times used the same Python which comes along with the organge classic installer (Orange3-3.15.0-Python34-win32.exe )

ajdapretnar commented 6 years ago

Could it be you are using just the Python of the Orange installer and not the environment where Orange is installed? Looks like you didn't use the Miniconda installer, which comes with Orange Command Prompt. So likely you don't have dependencies sorted. Judging from the error, you are missing bottleneck.

flave95 commented 6 years ago

Thank you all an especially @ajdapretnar, the example Add-On with the Test & Score widget is working fine.

Currently I'm working on the cocurred problem that the whole widget is disapearing after I add my 6 sampler class "Independet Validation". I could imagine that .py just dont know what to do with my 6 sampler cause i could not find the import testing.py (https://github.com/biolab/orange3/blob/master/Orange/evaluation/testing.py ) in the owtestlearners.py (https://github.com/biolab/orange3/blob/master/Orange/widgets/evaluate/owtestlearners.py) through which it's approch would be defined as for example in the Cross Validation sampler.

When I'm back home, i will upload the edited testing.py with my added -class IndependentValidation(Results) right now my notes are:

class` IndependentValidation(Results): #by ilja Grabovsky

Test.data split into Train_data, test_data
    """
    Test on Idenpendent Validation
    First the Splitter splits test_data in train_data and test_data.X. train_data will be sized to =.1
    to classify on it....erkärung.
    """
    def __init__(self, learner, k=2, stratified=True, random_state=0, store_data=False,
                 store_models=False, preprocessor=None, callback=None, warnings=None,
                 n_jobs=1)
        self.k = int(k)
        self.stratified = stratified
        self.random_state = random_state
        train_data =0.1
        if warnings is None:
            self.warnings = []
        else:
            self.warnings = warnings

        super().__init__(data, learners=learners, store_data=store_data,
                         store_models=store_models, preprocessor=preprocessor,
                         callback=callback, n_jobs=n_jobs)

    def setup_indices(self, train_data, test_data):
        self.indices = None
        if self.stratified and test_data.domain.has_discrete_class:
            try:
                splitter = skl.StratifiedKFold(
                    self.k, shuffle=True, random_state=self.random_state
                )
                splitter.get_n_splits(test_data.X, train_data)
                self.indices = list(splitter.split(test_data.X, train_data))
            except ValueError:
                self.warnings.append("Using non-stratified sampling.")
                self.indices = None
        if self.indices is None:
            splitter = skl.KFold(
                self.k, shuffle=True, random_state=self.random_state
            )
            splitter.get_n_splits, train_size=.1(test_data.X, train_data)
            self.indices = list(splitter.split(test_data))
        #train_data=n.1 -> train.data_main, test.data.X=n.9->skl.splitter=test.data_main

class IndependentValidation(Results): # test.data split

#by ilja Grabovsky
    """
    Test on Idenpendent Validation
    First the Splitter splits test_data.X in n_test_data

n=len(list(test_data.X)) - 1

n==0
    to classify on it....erkärung.
    """
    def __init__(self, learner, k=n, stratified=True, random_state=0, store_data=False,
                 store_models=False, preprocessor=None, callback=None, warnings=None,
                 n_jobs=n)

                 n=len(list(test_data.X)) - 1

                 n==0
        self.k = int(k)
        self.stratified = stratified
        self.random_state = random_state( unnecessary?)
        test_data.X_size = test_data.X_size
        if warnings is None:
            self.warnings = []
        else:
            self.warnings = warnings

        super().__init__(data, learners=learners, store_data=store_data,
                         store_models=store_models, preprocessor=preprocessor,
                         callback=callback, n_jobs=n_jobs)

    def setup_indices(self, test_data.X_data):
        self.indices = None
        if self.stratified and test_data.X.domain.has_discrete_class:
            try:
                splitter = skl.StratifiedKFold(
                    self.k, shuffle=True, random_state=self.random_state
                )
                splitter.get_n_splits(test_data.n)
                self.indices = list(splitter.split(test_data.n,))
            except ValueError:
                self.warnings.append("Using non-stratified sampling.")
                self.indices = None
        if self.indices is None:
            splitter = skl.KFold(
                self.k, shuffle=True, random_state=self.random_state
            )
            splitter.get_n_splits(test_data.n)
            self.indices = list(splitter.split(test_data.X))

class IndependentValidation(Results): #by ilja Grabovsky        
    """
    Classification of train_data sized to =.1
    """
    def __init__(self, data, learners, feature, store_data=False, store_models=False,
                 preprocessor=None, callback=None, n_jobs=#len(list(test_data.X)): the idea is that it tests every single data point(experimentee) adjust the p(x) (Prediction) to predict the next one better and improving the prediction score with every new test_data point. // (Ger) wenn ich das so lasse dann geht der jegliche Schritte x mal entsprechend der len(list(test_data,X)) durch
        self.feature = feature
        super().__init__(data, learners=learners, store_data=store_data,
                         store_models=store_models, preprocessor=preprocessor,
                         callback=callback, n_jobs=n_jobs)
    #has to be adjusted                     
    def setup_indices(self, train_data, test_data.X, test_data.n, test_data):
        data = Table(Domain([self.feature], None), test_data)
        values = data[:, self.feature].X
        self.indices = []
        for v in range(len(self.feature.values)):
            test_index = np.where(values == v)[0]
            train_index = np.where((values != v) & (~np.isnan(values)))[0]
            if len(test_index) and len(train_index):
                self.indices.append((train_index, test_index))
        if not self.indices:
            raise ValueError("No folds could be created from the given feature.")
flave95 commented 6 years ago

I had the idea to just copy the whole Leave One Out class and add it with the change "Leave One Out 2". So i generally made a new class but with just one simple adjustment. I even added the whole "Evaluation" folder to the example add on, after editing it in the same way by just copying the Loo class and renaming the necessary parts to "Leave One Out 2". But orange still doesnt shows the widget. The Debugger gives me the following information

Error while importing 'orangecontrib.example.widgets.owtestlearners'. The widget will not be shown. Traceback (most recent call last): File "C:\Users\iljag\AppData\Local\Orange\lib\site-packages\Orange\canvas\registry\discovery.py", line 261, in iter_widget_descriptions module = asmodule(name) File "C:\Users\iljag\AppData\Local\Orange\lib\site-packages\Orange\canvas\registry\discovery.py", line 491, in asmodule return __import__(module, fromlist=[""]) File "C:\Users\iljag\Desktop\Bachelor-PY\orange3-example-addon-master\orangecontrib\example\widgets\owtestlearners.py", line 286 gui.appendRadioButton(rbox, "Leave one out2")

do you have some ideas??

markotoplak commented 6 years ago

@flave95, try running the widget directly: python -m orangecontrib.example.widgets.owtestlearner. Of course, use the correct python as you needed for the installation. Perhaps that will give some nicer error messages.

It would be interesting if you described what kind of validation exactly are you trying to do. How is what are you trying to different to what astaric suggested you?

flave95 commented 6 years ago

@markotoplak i think it kinda helped, but im not sure how to place the rbox propperly or what to do with the occured information.

The general idea of the validation is (1.) to start wth a small (10%) data_set and to train on this (2.) one after an other we take small parts of the left over data (test_data.X) and (2a) train the classifyer on the train_data (2b) we note the calssification of x as (right / wrong) (2c) add x to our train_data (3.) we calculate the accuracy of the right classificated x in point (2.) and start do this n times n_jobs=len(list(test_data.X)) - 1 n!=0 It is called the "Independent Validation" and brought up by Prof. Dr. Timo von Oertzen and Prof. Dr. Bommae. Its today the only known statistical method with which you can do Bayesian testing.

Error message

C:\Users\iljag\Desktop\Bachelor-PY\orange3-example-addon-master>python -m orangecontrib.example.widgets.owtestlearner
Traceback (most recent call last):
  File "C:\Users\iljag\Miniconda3\lib\runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "C:\Users\iljag\Miniconda3\lib\runpy.py", line 153, in _get_module_details
    code = loader.get_code(mod_name)
  File "<frozen importlib._bootstrap_external>", line 781, in get_code
  File "<frozen importlib._bootstrap_external>", line 741, in source_to_code
  File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
  File "C:\Users\iljag\Desktop\Bachelor-PY\orange3-example-addon-master\orangecontrib\example\widgets\owtestlearner.py", line 286
    gui.appendRadioButton(rbox, "Leave one out2")
                                                ^
TabError: inconsistent use of tabs and spaces in indentation

`C:\Users\iljag\Desktop\Bachelor-PY\orange3-example-addon-master>

My added r.box "Leave One Out 2"

``        gui.appendRadioButton(rbox, "Random sampling")
        ibox = gui.indentedBox(rbox)
        gui.comboBox(
            ibox, self, "n_repeats", label="Repeat train/test: ",
            items=[str(x) for x in self.NRepeats], maximumContentsLength=3,
            orientation=Qt.Horizontal, callback=self.shuffle_split_changed)
        gui.comboBox(
            ibox, self, "sample_size", label="Training set size: ",
            items=["{} %".format(x) for x in self.SampleSizes],
            maximumContentsLength=5, orientation=Qt.Horizontal,
            callback=self.shuffle_split_changed)
        gui.checkBox(
            ibox, self, "shuffle_stratified", "Stratified",
            callback=self.shuffle_split_changed)
    gui.appendRadioButton(rbox, "Leave one out")

    gui.appendRadioButton(rbox, "Test on train data")
    gui.appendRadioButton(rbox, "Test on test data")

    gui.appendRadioButton(rbox, "Leave one out2")

    self.cbox = gui.vBox(self.controlArea, "Target Class")``
ajdapretnar commented 6 years ago

The error means you are mixing space indentation and tab indentation. It's better not to use tabs but 4 spaces instead. You can set your editor to automatically 'translate' tabs into spaces. Have a look at this issue: https://stackoverflow.com/questions/11816147/pycharm-convert-tabs-to-spaces-automatically