Closed cdecker8 closed 9 months ago
Oh I think I may know what the problem might be, a minor annoyance. I'll get back to you tomorrow with a solution or later tonight.
Here's some working code; there was some code to try to solve the main issue, which I deleted, but you'll have to add some of your code back obviously, sorry :-D
Also, I wasn't quite sure what you meant by the folds--is this what you meant, or did you want to do sampling with replacement?
import datetime
import sys
import jpype.imports
import pandas as pd
from sklearn.model_selection import train_test_split
# BASE_DIR = "/workspace/notebooks/Causal/py-tetrad/pytetrad/"
BASE_DIR = ""
sys.path.append(BASE_DIR)
date = datetime.datetime.now().strftime("%Y-%m-%d")
# convert to string
date = str(date)
if jpype.isJVMStarted():
jpype.shutdownJVM()
else:
try: # start a new jvm to clear memory heap and avoid memory error
jpype.startJVM(classpath=[f"{BASE_DIR}resources/tetrad-current.jar"])
print(jpype.java.lang.System.getProperty("java.class.path"))
except OSError:
pass
# these packages need to be imported after starting jvm
import edu.cmu.tetrad.search as ts
import tools.translate as tr
from java.util import ArrayList
df = pd.read_csv(f"{BASE_DIR}resources/bridges.data.version211_rev.txt", sep="\t")
target = 'SPAN'
for fold in range(1, 5):
print('reading in data from ' + str(fold))
train, test = train_test_split(df, test_size=.1)
dataSet = tr.pandas_data_to_tetrad(train)
score = ts.score.BdeuScore(dataSet)
alg = ts.FgesMb(score)
print('running search')
alg.setVerbose(True)
target_node = dataSet.getVariable(target)
target_list = ArrayList()
target_list.add(target_node)
print(target_list)
graph = alg.search(target_list)
print(graph)
# save edge list to file
with open(f'example_out.' + str(fold) + '.txt', 'w') as f:
f.write(date + '\n\n')
f.write(str(graph))
Apologies for any confusion about the folds which are specific to my analysis pipeline. The for loop is intended for reading in pre-saved datasets, which essentially represent the same data but may exhibit slight variations due to the imputation process. These folds consist of pre-saved imputed k-folds, which were generated using a modified version of Multiple Imputation by Chained Equations (MICE), resulting in 0/1 imputations for missing data within a predictive machine learning model pipeline.
Each imputation (dataset) may differ slightly based on the selection of the test set it was imputed from, and it was important for me to maintain data consistency between the predictive and causal models for my dissertation.
In discussions with Erich K., we explored the idea of running all the folds and filtering out edges that only appear in 1 or 2 of the folds, as these could potentially stem from imputation noise. I've developed another script that reads the edge lists and calculates the number of searches they appear in. I'll test this approach and provide an update later today.
I truly appreciate your assistance and your time in addressing this matter. Thank you.
Oh that's clever, I like it!
Thanks. Occasionally, we manage to thaw out some good ideas here in Minnesota, despite the chilly weather!
Though results are pending, the addition of edges suggests that the new code is functioning as intended. I suspect the error stemmed from my oversight in failing to extract the target node from the dataset and assuming I could simply pass the variable's name to the algorithm. Thanks again for your assistance with that. It likely saved me from days of debugging.
I'm encountering a java.lang.NullPointerException when attempting to run the FGES-Mb algorithm on a target node using JPype. Despite confirming the presence of the target node within the score object data through various print statements, the NullPointerException persists during algorithm execution.
I suspect that the issue might stem from my limited experience with JPype (I'm admittedly a JPype novice and have limited Java experience beyond occasionally popping into the tetrad javadocs) or the method I'm using to add the target node to the target_list. Despite extensive testing, the error persists, suggesting that I may be overlooking a crucial aspect in this process.
Any guidance or suggestions on troubleshooting and resolving this NullPointerException would be greatly appreciated. Thank you!
Error message and stacktrace
None java.lang.NullPointerException at edu.cmu.tetrad.graph.EdgeListGraph.getAdjacentNodes(EdgeListGraph.java:555) at edu.cmu.tetrad.search.FgesMb.search(FgesMb.java:213)
My code