Vanint / SADE-AgnosticLT

This repository is the official Pytorch implementation of Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition (NeurIPS 2022).
MIT License
146 stars 20 forks source link

GPU #1

Closed m1996 closed 3 years ago

m1996 commented 3 years ago

hello. very sorry for commenting here, my question is from ppn portfolio which its issues is closed and I had no other way to ask from you. i tried to run that project on google colab but it took so much time which colab doesn't accept. i figured out that tensorflow 1.4.0 doesn't use gpu. is there any solution for that? i tried so much but i got no answer. please help me. and if there is anything that i should consider while using colab for that project, please remind me.

regards

Vanint commented 3 years ago

Hi, I have not used google colab so I am not familiar with it. Which part is time-consuming? Data processing or optimization?

m1996 commented 3 years ago

Thank you so much for your reply, All of that is time consuming. I changed the code to print the number of each step and it took more than 4 hours for 137 steps in training (and the 'steps' in config is 10000!). And I saw that the tf.device_name() shows nothing which means it doesn't use the gpu. Should I install sth special for that? I install all the requirements as you said in the readme file(for example you said tensorflow>=1.4.0 and I install tensorflow=1.4.0). Would you please tell me a good config to train faster? Or give me a brief explanation for each config parameters so I change them properly.

Regards

On Sun, Aug 8, 2021, 6:53 PM vanint @.***> wrote:

Hi, I have not used google colab so I am not familiar with it. Which part is time-consuming? Data processing or optimization?

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Vanint/TADE-AgnosticLT/issues/1#issuecomment-894805391, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWCL3F7TSY3METJ3XITT5LT32HMZANCNFSM5BYPKTZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Vanint commented 3 years ago

In fact, I did not encounter this problem, where I conduct experiments on the sever with four Titan X GPUs. I think the reasons may lie on the gpu platform or python packages. Can you find any private server to verify the first factor? Moreover, have you tried to install tensorflow-gpu?

m1996 commented 3 years ago

I installed tensorflow-gpu=1.4 And cuda 8 and cudnn 6 But still doesn't work:( Did I install right versions?

On Sun, Aug 8, 2021, 7:37 PM vanint @.***> wrote:

In fact, I did not encounter this problem, where I conduct experiments on the sever with four Titan X GPUs. I think the reasons may lie on the gpu platform or python packages. Can you find any private server to verify the first factor? Moreover, have you tried to install tensorflow-gpu?

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Vanint/TADE-AgnosticLT/issues/1#issuecomment-894811188, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWCL3FRH6TLK4A7M5XAZBLT32MUJANCNFSM5BYPKTZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Vanint commented 3 years ago

I guess so. Since I have left the previous school, the server I used has been modified, so I cannot access it to confirm the details. In addition to finding a private server, one more suggestion is that you can first try the code in EIIE (https://github.com/ZhengyaoJiang/PGPortfolio), since our method is developed based on this repository. If it works, then you can change the method part to our method.

m1996 commented 3 years ago

Aha, thanks a lot. Can I use other versions of tensorflow?Should tensorflow-gpu version be 1.4?

On Sun, Aug 8, 2021, 8:10 PM vanint @.***> wrote:

I guess so. Since I have left the previous school, the server I used has been modified, so I cannot access it to confirm the details. In addition to finding a private server, one more suggestion is that you can first try the code in EIIE (https://github.com/ZhengyaoJiang/PGPortfolio), since our method is developed based on this repository. If it works, then you can change the method part to our method.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Vanint/TADE-AgnosticLT/issues/1#issuecomment-894815260, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWCL3EU67K7CS6VZEETMFDT32QPNANCNFSM5BYPKTZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Vanint commented 3 years ago

Sure, if it works.

m1996 commented 3 years ago

Thanks a lot for your guides. Wish you always be happy and healthy :)

On Sun, Aug 8, 2021, 8:34 PM vanint @.***> wrote:

Sure, if it works.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Vanint/TADE-AgnosticLT/issues/1#issuecomment-894818667, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWCL3HO2VNJ5XKI4X7NBZDT32TJ7ANCNFSM5BYPKTZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

m1996 commented 2 years ago

hello. again very sorry for commenting here, my question is from ppn portfolio which its issues is closed and I had no other way to ask from you. I fixed the gpu problem, and trained the agent but 1000 step by 1000 step (due to google colab limits) by using restore dir. but the results were so weird. The APV was 49000!! I figured out the omega (portfolio weights) got negative values which is not usual. do you have any idea why it happens? I thought its because of decision making module(as you commented: three strategies to make decision, where the leverage operation is beyond the version of the paper.) which uses 2-head leverage operation. so I changed it to fully connected layer. and after 5000 steps, the result has no change with a very small value. can you help me? where have I gone wrong?

regards

Vanint commented 2 years ago

Hi, the performance depends on many reasons, like data split (sometimes network may learn shortcut, which is hard to explain in Deep RL), training scheme, and so on. Therefore, I have no idea about the real reason, since it, at least, does not collapse. How about downloading another piece of data for finding a more reasonable benchmark (i.e., modifying operations can lead to more significant change), which may be a solution if you cannot handle it in the end.

Yes, the negative portfolio weight is due to the leverage operation; removing it leads to all positive values.

m1996 commented 2 years ago

thank you so much actually that the results were sooo different from what reported in the paper, was strange for me. I will check more closely again and if I run into anything, I will bother you again:) regards

m1996 commented 2 years ago

Hi againπŸ˜… I tried the code with another period of time of S&P500 prices, still I just got 1.5 for portfolio value:( AND it converges so fast in about 500 steps! Is it normal? Do you have any suggestion please?

On Tue, Sep 14, 2021, 6:02 AM vanint @.***> wrote:

Hi, the performance depends on many reasons, like data split (sometimes network may learn shortcut, which is hard to explain in Deep RL), training scheme, and so on. Therefore, I have no idea about the real reason, since it, at least, does not collapse. How about downloading another piece of data for finding a more reasonable benchmark (i.e., modifying operations can lead to more significant change), which may be a solution if you cannot handle it in the end.

Yes, the negative portfolio weight is due to the leverage operation; removing it leads to all positive values.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Vanint/TADE-AgnosticLT/issues/1#issuecomment-918719679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWCL3AUYFKCUBRRNL5QJNTUB2Q3VANCNFSM5BYPKTZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Vanint commented 2 years ago

I cannot remember the convergence speed exactly, since it is quite an old project.

m1996 commented 2 years ago

Thanks, in the paper it said 100000 steps! 15 hours!

On Wed, Nov 3, 2021, 6:00 AM vanint @.***> wrote:

I cannot remember the convergence speed exactly, since it is quite an old project.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Vanint/TADE-AgnosticLT/issues/1#issuecomment-958605746, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWCL3APEY6EEPB7XKZ6IRTUKCNDDANCNFSM5BYPKTZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.