Closed christran16 closed 5 years ago
Hi Chris, Thanks for the feedback, it seems all variable selection methods but 'gamboost' are not working, I will further investigate this
Hi again Chris, Sorry for the delay,
Actually I should find a way to traceback the error from the R process. This FileNotFoundError occurs when the R subprocess errors. After investigation, the error comes down to the variable selection methods, which are not fit for small graphs: the CAM R package reports the same issue.
Therefore, For small graphs you should consider setting variablesel=False
, which is computationally heavier, but unnoticeable on small graphs.
Since this is a bug on CAM and on its usage, I will be closing this issue, but feel free to discuss.
Best, Diviyan
Hi Diviyan, Is this error resolved? I also encountered the same error.
Hello, this error actually mirrors an error in the R process ; could you share an code snippet that reproduces the error ?
import pandas as pd
import cdt
from cdt.causality.graph import GES,GIES,CCDr,LiNGAM,PC,MMPC, IAMB,Fast_IAMB,Inter_IAMB,GS,CGNN,SAM,SAMv1, CAM
from cdt.data import load_dataset
import networkx as nx
import matplotlib.pyplot as plt
print("CAM")
# data, graph = load_dataset("sachs")
data = pd.read_csv('./data/server_sample/1_example.csv')
data.drop(['Unnamed: 0'],axis=1,inplace = True)
data=data.astype(float)
obj = CAM()
output = obj.predict(data) #No graph provided as an argument
# output = obj.predict(data, nx.Graph(graph)) #With an undirected graph
# output = obj.predict(data, graph) #With a directed graph
print('edge is ', output.edges())
data = pd.DataFrame({"edge": output.edges()})
data.to_csv("./data/server_sample/1/edge_CAM_alg.csv")
print('edge num', output.number_of_edges())
nx.draw_networkx(output, font_size=8)
plt.show()
Data as follows
hello, I have multiple data sets, with nodes ranging from 30 to 150, and data volumes ranging from 10,000 to 500,000.
For some algorithms in CDT, such as IAMB,GS,CGNN, etc. on a certain data set, the algorithm has been running for a long time and has not finished yet. May I ask if the running time can be estimated? In this way, I can roughly know when the algorithm can finish running.
Hi, I'm not able to reproduce your error... can you share your data ?
For having an estimation, there is no direct way of having it. One way could be to enable verbosity for the models by setting verbose=True
Please note that CGNN scales quadratically on the number of nodes and number of data points. I think that using CGNN might be unfit for your case as the computational time might be too high.
Best regards, Diviyan
Data as follows example.zip
Could you try again with the latest version on the dev
branch ? I recently fixed a bug on CAM on windows platforms
Best,
Diviyan
When I'm trying to run some examples with different parameters, I get this error:
Here is the snippet of code I'm trying to run