USGS-R / river-dl

Deep learning model for predicting environmental variables on river systems
Creative Commons Zero v1.0 Universal
21 stars 15 forks source link

NAN loss on gpu #170

Closed matiiin closed 2 years ago

matiiin commented 2 years ago

has solved your problem about "NAN loss on GPU"? I have the same problem while it is ok on CPU.

janetrbarclay commented 2 years ago

We were never able to resolve the issue of getting NAN's while using the tensorflow fft function on a GPU. We just recently switched to using pytorch and that seems to be working.

jdiaz4302 commented 2 years ago

Nice! Out of curiosity, did switching to pytorch resolve this issue for all model runs (i.e., even your larger tries on gpu)?

I'm also curious about any experiences with the newer RGCN on your project, but that is perhaps off topic for this issue.

janetrbarclay commented 2 years ago

Yes, we've had no NA's since switching to pytorch (have run the full DRB with the larger input set ~ 6 times so far, all with the rgcn_v1) With a sequence length of 365 and an offset of 1, pretraining times are ~ 30 min and finetuning 35 - 45 min.


Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108

Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay


From: Jeremy Diaz @.> Sent: Thursday, February 10, 2022 9:19 AM To: USGS-R/river-dl @.> Cc: Barclay, Janet R @.>; State change @.> Subject: [EXTERNAL] Re: [USGS-R/river-dl] NAN loss on gpu (Issue #170)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

Nice! Out of curiosity, did switching to pytorch resolve this issue for all model runs (i.e., even your larger tries on gpu)?

I'm also curious about any experiences with the newer RGCN on your project, but that is perhaps off topic for this issue.

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Friver-dl%2Fissues%2F170%23issuecomment-1034975990&data=04%7C01%7Cjbarclay%40usgs.gov%7C55ab6e01910f40b56d7008d9eca11ae6%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637800998956105982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DJVp7ynHQHFf1QbK6l33WfT1eOY5uujyfNO9QVg0smI%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UGLE2QT54KVSKJEJ5LU2PCPJANCNFSM5N43FF2Q&data=04%7C01%7Cjbarclay%40usgs.gov%7C55ab6e01910f40b56d7008d9eca11ae6%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637800998956105982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qIWKqf8hEOLfzNhrkKVorqh%2F8FEHWa0WNpEJ7d56qBE%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cjbarclay%40usgs.gov%7C55ab6e01910f40b56d7008d9eca11ae6%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637800998956105982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zp%2FGtS%2BZQhejTuY3how%2FdmWQ8Z1zTJVXuRqwI%2BOpAdc%3D&reserved=0 or Androidhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cjbarclay%40usgs.gov%7C55ab6e01910f40b56d7008d9eca11ae6%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637800998956105982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gAUY7YHm8uiDVkt1%2BEXQ%2FbLYd5slQ2hrvrpsgR0xuOs%3D&reserved=0. You are receiving this because you modified the open/close state.Message ID: @.***>

jdiaz4302 commented 2 years ago

Wow, that's really awesome! I'm glad it's been useful 😄