Closed yasu-sh closed 1 year ago
I see, it is the order of the variables you are talking about? I'll have another look.
What you you mean by bootstrapping, did you add another parameter to the tetrad_fges module?
@felixleopoldo I am sorry. I should have made everything to report this. (I forgot adding the words, "I will investigate later/tommorow.")
I was wondering whether the evaluation metrics/plotting module is not consider on the discrepancy the order of variables or not. Let me have some time for check this. I spent several hours for this phenomena.
As for the bootstrapping, I added the parameter to check the effects as following your tutorial at UAI2023. It is not the point in this case. I will be checking by asia dataset.
[Preparation] Dataset: obtained in R console. data(alarm) Adjacent Matrix: created as following bnlearn help(below).
> alarm_gt <- bnlearn::model2network(paste0(
+ "[HIST|LVF][CVP|LVV][PCWP|LVV][HYP][LVV|HYP:LVF][LVF]",
+ "[STKV|HYP:LVF][ERLO][HRBP|ERLO:HR][HREK|ERCA:HR][ERCA][HRSA|ERCA:HR][ANES]",
+ "[APL][TPR|APL][ECO2|ACO2:VLNG][KINK][MINV|INT:VLNG][FIO2][PVS|FIO2:VALV]",
+ "[SAO2|PVS:SHNT][PAP|PMB][PMB][SHNT|INT:PMB][INT][PRSS|INT:KINK:VTUB][DISC]",
+ "[MVS][VMCH|MVS][VTUB|DISC:VMCH][VLNG|INT:KINK:VTUB][VALV|INT:VLNG]",
+ "[ACO2|VALV][CCHL|ACO2:ANES:SAO2:TPR][HR|CCHL][CO|HR:STKV][BP|CO:TPR]"))
> bnlearn::amat(alarm_gt)
ACO2 ANES APL BP CCHL CO CVP DISC ECO2 ERCA ERLO FIO2 HIST HR HRBP HREK HRSA HYP INT KINK LVF LVV MINV MVS PAP PCWP PMB PRSS PVS SAO2 SHNT STKV TPR VALV VLNG VMCH VTUB
ACO2 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ANES 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Before checking my creation from bnlearn asia dataset, I have made some result discrepancy from sachs dataset in benchpress repos itself. I would have like to understand why the discrepancy happens. diffplots and benchmark metrics are my concerns.
When you use the column-order-reversed dataset,
The steps in resources/data/mydatasets directory:
Making the opposite column order of sachs dataset in R
> sachs.data <- data.table::fread("2005_sachs_2_cd3cd28icam2_log_std.csv")
> head(sachs.data,2)
Akt Erk Jnk Mek P38 PIP2 PIP3 PKA PKC Plcg Raf
1: -0.6343361 -0.1117883 -0.3707515 -0.58558428 -0.06458972 0.6818205 -0.3240229 -0.04326735 -0.6878319 -0.3955337 -0.5148379
2: -3.0409103 -2.5379116 1.0548648 -0.08291055 -0.10231212 1.6658269 1.1813047 -4.07209170 0.2993658 0.6777917 -0.1101130
> data.table::setcolorder(sachs.data, sort(colnames(sachs.data), decreasing = T))
> head(sachs.data,2)
Raf Plcg PKC PKA PIP3 PIP2 P38 Mek Jnk Erk Akt
1: -0.5148379 -0.3955337 -0.6878319 -0.04326735 -0.3240229 0.6818205 -0.06458972 -0.58558428 -0.3707515 -0.1117883 -0.6343361
2: -0.1101130 0.6777917 0.2993658 -4.07209170 1.1813047 1.6658269 -0.10231212 -0.08291055 1.0548648 -2.5379116 -3.0409103
> data.table::fwrite(sachs.data, "2005_sachs_2_cd3cd28icam2_log_std_colorder_dec.csv")
Executing paper_sachs.json normally
Left diffplot - config/paper_sachs.json
{
"benchmark_setup": {
"data": [
{
"graph_id": "sachs.csv",
"parameters_id": null,
"data_id": "2005_sachs_2_cd3cd28icam2_log_std.csv",
"seed_range": null
}
],
Right diffplot - config/paper_sachs.json
{
"benchmark_setup": {
"data": [
{
"graph_id": "sachs.csv",
"parameters_id": null,
"data_id": "2005_sachs_2_cd3cd28icam2_log_std_colorder_dec.csv",
"seed_range": null
}
],
Thanks. I think the order of the dataset and the columns of the adjacency matrix have to be the same. For the Sachs data I reordered and renamed manually the data columns (as far as I remember) to match the graph from bnlearn.
Did you find an example generated by bp where thee is a mismatch of orders?
Thanks for telling your data preparation.
Did you find an example generated by bp where thee is a mismatch of orders?
Yes and No.
bnlearn's output by using amat function: alphabet order in columns bnlearn's built-in dataset like alarm has no alphabet order in columns
No means: As long as the bp's prepared examples, I have not noticed inconsistency.
My Data scenario 2: Fixed Graph and Fixed Data prepared by users both. https://benchpressdocs.readthedocs.io/en/latest/json_overview.html#example-data-scenarios
Addtional scenario: Besides scenario 2, hyperparameter added on tetrad algorithms. Tetrad output with bootstrapping has the alphabet order automatically
This codereflects the internal order of tetrad graph instance.
adjmat.csv_out_tetrad_without_bootstrap.txt
Graph Nodes:
CVP;PCWP;HIST;TPR;BP;CO;HRBP;HREK;HRSA;PAP;SAO2;FIO2;PRSS;ECO2;MINV;MVS;HYP;LVF;APL;ANES;PMB;INT;KINK;DISC;LVV;STKV;CCHL;ERLO;HR;ERCA;SHNT;PVS;ACO2;VALV;VLNG;VTUB;VMCH
adjmat.csv_out_tetrad_with_bootstrap.txt
Graph Nodes:
ACO2;ANES;APL;BP;CCHL;CO;CVP;DISC;ECO2;ERCA;ERLO;FIO2;HIST;HR;HRBP;HREK;HRSA;HYP;INT;KINK;LVF;LVV;MINV;MVS;PAP;PCWP;PMB;PRSS;PVS;SAO2;SHNT;STKV;TPR;VALV;VLNG;VMCH;VTUB
[Information] adjacent matrix adjmat_tetrad_fges_estimated_withoutbootstrap.csv alarm_gt_amat.csv adjmat_tetrad_fges_estimated_withbootstrap.csv
Even tetrad's case without bootstrapping(normal benchpress case), the output nodes order is the same as dataset's colmun order. The conclusion is the same. [fix] dataset needs to have the same column order with ground truth adjacent matrix one.
Then I think it's OK, looking at the data file and the graph file will also be less confusing when the columns are consistent. But there should be a text clarifying this for scenario II, somewhere here.
I am glad to hear that. It's good.
For beginners, some users may firstly refer to 'file format' section. It might be a good choice to add a link from the page below to scenario II page. https://benchpressdocs.readthedocs.io/en/latest/data_formats.html#observational-data
Yes, that might be good as well.
Thanks for telling this. Should I close this issue now or after updating documents?
Sure, maybe you can cave a look here first. I added a note just at one place not to make it more confusing.
I did it. I am satisfied to have the consistency and I do not have to see garbage (terrible SHD or diffplot).
This would be true since I noticed the large SHD number are obtained at no-bootstrapping even I get reasonalbe result with bootstrapping in tetrad from my eyes in plot. If it is true, it is important for users.
Dataset: alarm(made from bnlearn by me) left: ground truth / center: without bootstrapping / right: with bootstrapping = 5
Diffplot
Graph structure