FenTechSolutions / CausalDiscoveryToolbox

Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.
https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html
MIT License
1.08k stars 198 forks source link

FileNotFoundError #24

Closed christran16 closed 5 years ago

christran16 commented 5 years ago

When I'm trying to run some examples with different parameters, I get this error:

FileNotFoundError: File b'/tmp/cdt_CAMbc4bbf1c-b80b-4e8b-9184-23ba73222cce/result.csv' does not exist

Here is the snippet of code I'm trying to run

import networkx as nx from cdt.causality.graph import CAM from cdt.data import load_dataset data, graph = load_dataset("sachs") obj = CAM(selmethod='gam') output = obj.predict(data)

diviyank commented 5 years ago

Hi Chris, Thanks for the feedback, it seems all variable selection methods but 'gamboost' are not working, I will further investigate this

diviyank commented 5 years ago

Hi again Chris, Sorry for the delay,

Actually I should find a way to traceback the error from the R process. This FileNotFoundError occurs when the R subprocess errors. After investigation, the error comes down to the variable selection methods, which are not fit for small graphs: the CAM R package reports the same issue. Therefore, For small graphs you should consider setting variablesel=False, which is computationally heavier, but unnoticeable on small graphs.

Since this is a bug on CAM and on its usage, I will be closing this issue, but feel free to discuss.

Best, Diviyan

nullgogo commented 3 years ago

Hi Diviyan, Is this error resolved? I also encountered the same error.

image

diviyank commented 3 years ago

Hello, this error actually mirrors an error in the R process ; could you share an code snippet that reproduces the error ?

nullgogo commented 3 years ago
import pandas as pd
import cdt
from cdt.causality.graph import GES,GIES,CCDr,LiNGAM,PC,MMPC,   IAMB,Fast_IAMB,Inter_IAMB,GS,CGNN,SAM,SAMv1,  CAM
from cdt.data import load_dataset
import networkx as nx
import matplotlib.pyplot as plt

print("CAM")
# data, graph = load_dataset("sachs")
data = pd.read_csv('./data/server_sample/1_example.csv')
data.drop(['Unnamed: 0'],axis=1,inplace = True)
data=data.astype(float)
obj = CAM()
output = obj.predict(data) #No graph provided as an argument
# output = obj.predict(data, nx.Graph(graph)) #With an undirected graph
# output = obj.predict(data, graph)  #With a directed graph
print('edge is ', output.edges())
data = pd.DataFrame({"edge": output.edges()})
data.to_csv("./data/server_sample/1/edge_CAM_alg.csv")
print('edge num', output.number_of_edges())
nx.draw_networkx(output, font_size=8)
plt.show()

Data as follows image

nullgogo commented 3 years ago

hello, I have multiple data sets, with nodes ranging from 30 to 150, and data volumes ranging from 10,000 to 500,000.

For some algorithms in CDT, such as IAMB,GS,CGNN, etc. on a certain data set, the algorithm has been running for a long time and has not finished yet. May I ask if the running time can be estimated? In this way, I can roughly know when the algorithm can finish running.

diviyank commented 3 years ago

Hi, I'm not able to reproduce your error... can you share your data ?

For having an estimation, there is no direct way of having it. One way could be to enable verbosity for the models by setting verbose=True

Please note that CGNN scales quadratically on the number of nodes and number of data points. I think that using CGNN might be unfit for your case as the computational time might be too high.

Best regards, Diviyan

nullgogo commented 3 years ago

Data as follows example.zip

diviyank commented 3 years ago

Could you try again with the latest version on the dev branch ? I recently fixed a bug on CAM on windows platforms Best, Diviyan