Closed sAviOr287 closed 4 years ago
Dear @sAviOr287 ,
Thanks for pointing that out: we will fix this point on the next update !
Best regards, Diviyan
Thanks,
I was also wondering about the run times The paper quoted 24min on GPU. But when I tried out your code it seems to take at least 10+ hours with 6 GPUs for the Multi dataset for example as it has 300 datasets each can potentially has to be trained 32 x2 times I was wondering if you could clarify the number on the CGNN paper or whether i am doing sth wrong
Thanks for the quick reply.
I thought I fixed the performance issue ; is your package up to date ? It's the GNN right (and not the CGNN) ? Could you check that the cdt uses the 6 GPUs ? (via nvidia-smi) (one GNN is executed on one GPU, it's when you make multiple runs that all the GPUs are used )
Depending on the memory consumption, you could squeeze multiple GNN on one GPU. how many examples do you have ? (what is the size of your datasets ? )
yeah, I am using GNN,
I just cloned it 1 week ago. Yeah when I start the code it says I have 8gpus but I only use 6 to run and using nvidia-smi i can see that all 6 are being used I am just running it on the Multi/Gauss/Net datasets which have every 300 datasets each 1500 points.
where nb_max_runs=32, n_runs=6, train_epochs=1000, dataloader_workers=0 So is the 24 mins if I max out every single GPU ie around 32 models at once?
Ok great,
The 24 Mins were obtained for a single dataset (of 500 points) maxing out one GPU (Kepler arch.), for its 32x2 runs.
I noticed that there is no specific code for dataset prediction, and the memory consumption might be suboptimal, but this is a minor issue. (Keep in mind to run the pairs one by one)
However, something is strange here: 7728c3d should have removed this nb_max_runs argument ; it should be deprecated... This comes from an old version of the code; could you execute ? :
print(cdt.__version__)
Otherwise, you could get some more performance my monitoring the GPU compute consumption (The % on the right in nvidia-smi) and the memory consumption. setting n_jobs to 6*k and n_runs=32, it's a nice trick to cut down computation time.
Best,
OH yeah are were right somehow didn't get the latest version. Could you tell me what you changed besides removing the Testing bit training 1000 epochs is a lot faster somehow now.
I am also curious how people still go results with the wrong loss function lol
Thanks a lot for your help btw.
The dataset management is done differently: the PyTorch Dataset feature might be a bit tricky to use, so i simplified it a lot and used the proposed TensorDataset class that eases the use of this feature. (thus cutting down computation time for most of the cases)
There were maybe some (unconfirmed) performance issues on GNN (#44), could you check if the performance is still good ? (values around the ones from the paper) ; I was afraid that I broke something during the refactoring.
I'll be closing this issue, don't hesitate to open a new issue if a performance issue on GNN arises. Fingers crossed that it was the bug that you noticed !
Hi,
I just read through the CGNN code, mainly interested in the pairwise version.
It looks like the criterion in computing the MMD(y, y_pred) https://github.com/FenTechSolutions/CausalDiscoveryToolbox/blob/32200779ab9b63762be3a24a2147cff09ba2bb72/cdt/causality/pairwise/GNN.py#L111
However, in the original paper the compute the MMD([x, y], [x, y_pred]) https://github.com/GoudetOlivier/CGNN/blob/e3fcfc570e30fb8dad8bf00f619ef3c21998bb90/Code/cgnn/GNN.py#L70
Thanks a lot for the repo and the reply. Helped me understand a lot of new things.