daffidwilde / edo

A library for generating artificial datasets through genetic evolution.
https://doi.org/10.1007/s10489-019-01592-4
MIT License
13 stars 0 forks source link

Conserving datatypes of columns #48

Closed daffidwilde closed 6 years ago

daffidwilde commented 6 years ago

The issue

Not a huge issue but it would be nice to preserve datatypes during crossover. That way, analysis of the data can be done more easily after running.

Example

The issue arises when an NaN is introduced to a column with integer datatype. For instance:


>>> import numpy as np
>>> from edo.individual import create_individual
>>> from edo.pdfs import Poisson

>>> np.random.seed(0)

>>> df, meta = create_individual(row_limits=[10, 10], col_limits=[1, 1], pdfs=[Poisson])

>>> print(df.dtypes)
0    int64
dtype: object

>>> print(df)
   0
0  2
1  2
2  4
3  2
4  0
5  0
6  5
7  1
8  1
9  2

>>> df.iloc[3:5, 0] = np.nan

>>> print(df.dtypes)
0    float64
dtype: object

>>> print(df)
     0
0  2.0
1  2.0
2  4.0
3  NaN
4  NaN
5  0.0
6  5.0
7  1.0
8  1.0
9  2.0

Potential solutions

  1. All values from categorical distributions are returned as strings.
  2. Work out the "add-on" part of the dataframe separately (maybe even columns separately as well) and then append it.
drvinceknight commented 6 years ago

This is an interesting one. Let's have a chat about it?

drvinceknight commented 6 years ago

We discussed and decided on option of tracking the type of a column and after all resampling is completed setting the type of column. (Correct me if I'm wrong @daffidwilde, just commenting to keep track :+1:)

daffidwilde commented 6 years ago

@drvinceknight I've just opened a PR implementing this fix #51