Open Al-Dailami opened 2 years ago
Hello! Thanks for the detailed comment. I’ll work through the code and tell you what outputs I should get. I won’t have time this weekend though, so I’ll get back to you after that!
Thanks for your reply,
One more error, this is the first one I got.
File ".../eICU-GNN-LSTM/src/dataloader/convert.py", line 75, in convert_into_mmap write_file[n : n+arr_len, :] = arr # write into mmap ValueError: could not broadcast input array from shape (62385,92) into shape (62385,57)
These numbers are different from the ones I got after running the preprocessing code. diagnosis= 356, labels=5, and flat=93
Fixed by: df = pd.read_csv(csv_path)
n_cols = (df.shape[1] -1) if n_cols is None else n_cols
Thank you for sharing! I will be looking into this soon. Apologies for the incredibly long delay in getting back and I appreciate you sharing the solution here
Hello
First, I would like to thank you for sharing the code of your awesome projects. I am trying to run your code and reproduce your experiments. Currently, I'm facing a problem. Here are the errors and my fixes:
[0]
File ".../eICU-GNN-LSTM/graph_construction/create_bert_graph.py", line 19, in make_graph_bert distances = torch.cdist(batch, bert, p=2.0, compute_mode='use_mm_for_euclid_dist_if_necessary')
RuntimeError: cdist only supports floating-point dtypes, X1 got: Byte
Fix: changed dtype from ByteTensor to FloatTensor File ".../eICU-GNN-LSTM/graph_construction/create_graph.py", line 15 dtype = torch.cuda.sparse.FloatTensor if device.type == 'cuda' else torch.sparse.FloatTensor https://github.com/EmmaRocheteau/eICU-GNN-LSTM/blob/5167eea88bfe7a3146ccb6194f54e8e57f52128b/graph_construction/create_graph.py#L15
File "/home/sale/eICU-GNN-LSTM/graph_construction/create_graph.py", line 65, in make_graph_penalise s_pen = 5 * s - total_combined_diags # the 5 is fairly arbitrary but I don't want to penalise not sharing diagnoses too much
RuntimeError: The size of tensor a (89123) must match the size of tensor b (1000) at non-singleton dimension 1
Fix: File ".../eICU-GNN-LSTM/graph_construction/create_graph.py", line 194
u, v, vals, k = make_graph_penalise(all_diagnoses, scores, debug=False, k=args.k) ############### debug=False Fixes problem https://github.com/EmmaRocheteau/eICU-GNN-LSTM/blob/5167eea88bfe7a3146ccb6194f54e8e57f52128b/graph_construction/create_graph.py#L194
[1]
File "../projects/eICU-GNN-LSTM/src/models/pyg_ns.py", line 241, in inference edge_attn = torch.cat(edge_attn, dim=0) # [no. of edges, n_heads of that layer]
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
Fix: if i == 1 and get_attn: edge_index_w_self_loops = torch.cat(edge_index_w_self_loops, dim=1) # [2, n. of edges] if get_attn: edge_attn = torch.cat(edge_attn, dim=0) # [no. of edges, n_heads of that layer] all_edge_attn.append(edge_attn) https://github.com/EmmaRocheteau/eICU-GNN-LSTM/blob/5167eea88bfe7a3146ccb6194f54e8e57f52128b/src/models/pyg_ns.py#L241
[2]
File "../eICU-GNN-LSTM/train_ns_lstmgnn.py", line 94, in validation_step out = out[self.dataset.data.val_mask] TypeError: only integer tensors of a single element can be converted to an index
Fix: out = out[0][self.dataset.data.val_mask] https://github.com/EmmaRocheteau/eICU-GNN-LSTM/blob/5167eea88bfe7a3146ccb6194f54e8e57f52128b/train_ns_lstmgnn.py#L94
[3]
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
Fix: In the same file "../eICU-GNN-LSTM/train_ns_lstmgnn.py" line 96 Added the following lines: out[out != out] = 0 out_lstm[out_lstm != out_lstm] = 0
https://github.com/EmmaRocheteau/eICU-GNN-LSTM/blob/5167eea88bfe7a3146ccb6194f54e8e57f52128b/train_ns_lstmgnn.py#L94 because when I print those matrices found some NaN values.
After this, the code starts training. BUT with wired training progress (loss always nan) !!! Print out the output matrices found that it's always NANs !!!
acc: 0.9049 prec0: 0.9049 prec1: nan rec0: 1.0000 rec1: 0.0000 auroc: 0.5000 auprc: 0.5476 minpse: 0.0951 f1macro: 0.4750 Epoch 1: 92%|█████████████████████████████████████████████████████████████████████████████████▎ | 452/489 [00:35<00:02, 12.78it/s, loss=nan, v_num=83]
I tried to trace the source of the error and the NaNs come after lstm layer this line:
https://github.com/EmmaRocheteau/eICU-GNN-LSTM/blob/5167eea88bfe7a3146ccb6194f54e8e57f52128b/src/models/lstm.py#L39
Please correct me if I'm wrong ... Thanks a lot in advance...
Note: I have used the same version of packages listed on the requirements.txt file