CodeSpaceHQ / MENGEL

A framework that applies machine learning algorithms and automates the process of finding the right algorithm for the job.
6 stars 1 forks source link

get_missing_ratios fails with multiple missing columns #140

Closed isaac-gs closed 7 years ago

isaac-gs commented 7 years ago

I get, "cannot convert the series to <type 'float'>"

When there are multiple columns in the dataset that have missing columns.

Happy Thanksgiving :)

ZakeryFyke commented 7 years ago

@ASAAR it fails when the dataset has multiple empty columns, or when the dataset has multiple columns containing missing values?

isaac-gs commented 7 years ago

@ZakeryFyke when a dataset has multiple columns containing empty values.

Steps for replication,

  1. Merge the training and test data for titanic: data = pandas.concat([train, test])
  2. Select data that doesn't have missing values: self.complete_data = self.select_complete_data(train)
    def select_complete_data(self, data):
            data = data_splitting.remove_non_numeric_columns(data)
            return data_filler.drop_missing_data_rows(data, 0)

= FUN

P.S. I also cleaned out all the non-numeric columns first

ZakeryFyke commented 7 years ago

@ASAAR I've used those steps, and also attempted by adding empty columns and emptying out values from existing columns, but I'm not able to replicate this issue. Also, does step 2 in your replication select data with no missing values? If so, that's a very interesting naming scheme.

isaac-gs commented 7 years ago

@ZakeryFyke my bad, fixed the instructions

isaac-gs commented 7 years ago

@ZakeryFyke Issue fixed, it happened to be caused my something I did. Essentially if I had two datasets.

A with indices 0, 1, 2, 3, 4, 5 and B with 0, 1, 2. If I concat them, then it becomes 0, 1, 2, 3, 4, 5, 0, 1, 2 unless I use "ignore_index=True" in the concat command. Not doing that messed up your function.