Closed sAviOr287 closed 4 years ago
Hello, the results seem good, the accuracy should follow, could you join a sample of data_
?
I think the predictions are not in the expected shape..
Best,
I have added the csv file that comes out after training the model.
Thanks for your help!
Best
Here is also the way I loaded the data. I add this in cdt/data/loader.py
def load_ce_gauss(shuffle=False):
dirname = os.path.dirname(os.path.realpath(__file__))
data = read_causal_pairs('{}/resources/CE-Gauss_pairs.csv'.format(dirname), scale=False)
labels = pd.read_csv('{}/resources/CE-Gauss_targets.csv'.format(dirname)).set_index('SampleID')
if shuffle:
for i in range(len(data)):
if random.choice([True, False]):
labels.iloc[i, 0] = -1
buffer = data.iloc[i, 0]
data.iloc[i, 0] = data.iloc[i, 1]
data.iloc[i, 1] = buffer
return data, labels
def load_ce_multi(shuffle=False):
dirname = os.path.dirname(os.path.realpath(__file__))
data = read_causal_pairs('{}/resources/CE-Multi_pairs.csv'.format(dirname), scale=False)
labels = pd.read_csv('{}/resources/CE-Multi_targets.csv'.format(dirname)).set_index('SampleID')
if shuffle:
for i in range(len(data)):
if random.choice([True, False]):
labels.iloc[i, 0] = -1
buffer = data.iloc[i, 0]
data.iloc[i, 0] = data.iloc[i, 1]
data.iloc[i, 1] = buffer
return data, labels
def load_ce_net(shuffle=False):
dirname = os.path.dirname(os.path.realpath(__file__))
data = read_causal_pairs('{}/resources/CE-Net_pairs.csv'.format(dirname), scale=False)
labels = pd.read_csv('{}/resources/CE-Net_targets.csv'.format(dirname)).set_index('SampleID')
if shuffle:
for i in range(len(data)):
if random.choice([True, False]):
labels.iloc[i, 0] = -1
buffer = data.iloc[i, 0]
data.iloc[i, 0] = data.iloc[i, 1]
data.iloc[i, 1] = buffer
return data, labels
Hello, Whoops I forgot to ask if you had the labels as well ?
oh yeah I have
which I downloaded from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3757KX
Thanks for the reply
Best
Thanks for getting back to me quickly,
There seems to be an issue with your accuracy computation ; i got an accuracy of .72 on this dataset:
import pandas as pd
import numpy as np
from sklearn.metrics import average_precision_score, accuracy_score
preds = pd.read_csv('res2_gauss.csv')
labels = pd.read_csv('CE-Gauss_targets.csv')
print(labels.shape, preds.shape)
print(labels.columns, preds.columns)
# Returns :(300, 2) (300, 2)
# Returns : Index(['SampleID', 'Target'], dtype='object') Index(['SampleID', 'Predictions'], dtype='object')
average_precision_score(labels.Target, preds.Predictions) ## Equals to AUPR
# Returns :0.8027886920926466
preds.loc[preds.Predictions > 0, 'Predictions'] = 1
preds.loc[preds.Predictions < 0, 'Predictions'] = -1
accuracy_score(labels.Target,preds.Predictions)
# Returns : 0.7233333333333334
From my point of view, accuracy however might not be the best metric for evaluating causal algorithms: The confidence of an algorithm has to be taken into account, thus giving the possibility of not committing into a prediction if the prediction is not certain (Not answering is better that giving a wrong causal direction).
Best regards, Diviyan
Thanks a lot Sorry, I was an idiot ... I forgot to increment the idx variable
Thanks for your help
Sorry for the inconvenience
No issues, glad I could help you! I'll be closing this issue, have a good day !
Hi,
So I have tried to run the experiments again for the CGNN pairwise experiments.
And I can confirm to get the same results for the Multi, Gauss, Net, Tueb datasets in terms of AUPRC (using 12 different runs to ensemble) AUPR: 0.95 MULTI AUPR: 0.80 GAUSS AUPR: 0.90 NET
However when I look at the acc ie. predicting the actual direction I get: 0.43, 0.46, 0.49 respectively.
I compute the acc by the score
This method also gives me around 74% unweighted on Tueb dataset.
So my question is whether this is expected or whether i should be computing the acc differently or maybe even the ACC doesnt matter?
Thanks for the clarification in advance.
Best