abidlabs / contrastive

Contrastive PCA
MIT License
199 stars 47 forks source link

Mice protein experiment does not work #8

Closed Periodinan closed 6 years ago

Periodinan commented 6 years ago

When I try to run the experiment, I get the following error message: classes = np.genfromtxt('/Users/Dina/Desktop/Python/Data_Cortex_Nuclear.csv',delimiter=',', skip_header=1,usecols=range(78,81),dtype=None, encoding = 'bytes') __main__:1: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.

Also, the resulting cPCAs don't show any values at all, so I guess the import does not really work?

Here's the code I used:

import numpy as np

data = np.genfromtxt('/Users/Dina/Desktop/Python/Data_Cortex_Nuclear.csv',delimiter=',',skip_header=1,usecols=range(1,78),filling_values=0)

classes = np.genfromtxt('/Users/Dina/Desktop/Python/Data_Cortex_Nuclear.csv',delimiter=',', skip_header=1,usecols=range(78,81),dtype=None) __main__:1: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.

target_idx_A = np.where((classes[:,-1]==b'S/C') & (classes[:,-2]==b'Saline') & (classes[:,-3]==b'Control'))[0]

target_idx_B = np.where((classes[:,-1]==b'S/C') & (classes[:,-2]==b'Saline') & (classes[:,-3]==b'Ts65Dn'))[0]

labels = len(target_idx_A)*[0] + len(target_idx_B)*[1]

labels = len(target_idx_A) + len(target_idx_B)

target_idx = np.concatenate((target_idx_A,target_idx_B))

target = data[target_idx]

background_idx = np.where((classes[:,-1]==b'C/S') & (classes[:,-2]==b'Saline') & (classes[:,-3]==b'Control'))

background = data[background_idx]

from contrastive import CPCA

mdl = CPCA()

projected_data = mdl.fit_transform(target, background, plot=True, active_labels=labels)

abidlabs commented 6 years ago

This seems to be an issue np.genfromtxt not the contrastive library. My suggestion would be to search StackOverflow for this issue. It seems like you have to pass in an encoding argument to genfromtxt and others have run into similar issues as well: e.g. https://github.com/numpy/numpy/issues/10990