Training a Model - Githubissues

ldoyle4 commented 3 years ago

Hi Adina,

I have been trying to follow the tutorial on creating and training a CNN but I'm having issues when using cnn.train_models(). The error looks to be with tensorflow and that the loss functions receives 2 tensors with different shapes.... (my google attempt here https://stackoverflow.com/questions/58609967/tensorflow-2-0-condition-x-y-did-not-hold-element-wise).

here is the error: tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (sequential/dense_1/Sigmoid:0) = ] [[0.0393062606][0.0676076859][0.0730998814]...] [y (metrics/precision/Cast_2/x:0) = ] [0]

Was wondering if you can help? I am basically trying to train my CNN using all of the flares from Guenther et al. 2020, as when I used the models in your example some of the flares from my lightcurve were being missed out.

Thanks, Lauren

afeinstein20 commented 3 years ago

Hey Lauren,

I haven't seen this issue before, so thanks for bringing it to my attention! Can you please provide me with the versions you are using for Python, tensorflow, and lightkurve so I can dig into this a bit more? Thanks!

ldoyle4 commented 3 years ago

Hi Adina,

Thanks for getting back to me and sorry for not having this information posted earlier! The versions are Python 3.6, tensorflow 2.1.0, lightkurve 1.9.0. If you need anything else then let me know :)

afeinstein20 commented 3 years ago

Hey Lauren,

Just wanted to let you know I'm still looking into this! I haven't been able to reproduce the issue yet. Can you please provide what versions of numpy and scipy you're using as well?

Thanks!

ldoyle4 commented 3 years ago

Not a problem! Thanks for letting me know :) Sorry about my delayed response, I had a proposal deadline yesterday!

My versions are numpy = 1.19.5 and scipy = 1.4.1. If it helps any I did pip install for stella.

Thanks, Lauren

ldoyle4 commented 3 years ago

Hi Adina,

Just an update, I have tried this now with both the pip install of stella in Python 3.6 and the git install with Python 3.7 and when I get to the ccn.train_models() I get the same error in both versions!

Thanks!

pjstanley1 commented 3 years ago

Hello Adina and Lauren,

Any updates regarding this? I was also trying to train the CNN using all the flares from Guenther et al. 2020 when I got the same error at cnn.train_models()!

I'm using the pip install of stella in Python 3.7. My versions are: tensorflow 2.1.0, lightkurve 2.0.0, numpy 1.19.2, and scipy 1.4.1

I also noticed that tensorflow 2.1.0 requires scipy 1.4.1, while stella's requirements.txt has scipy!=1.4.1. Is this a typo? If not then maybe its causing the error?

Thanks, Patrick

afeinstein20 commented 3 years ago

Hey @ldoyle4 and @pjstanley1

I think this may be a tensorflow issue. Could you try updating stella to pip install stella==0.2.0rc2 and seeing if that works?

pjstanley1 commented 3 years ago

Hello Adina, That seemed to fix it for me. Thanks!

afeinstein20 commented 3 years ago

Hey @ldoyle4 just checking in to see if there's any update on if this new version fixed your issue as well?

ldoyle4 commented 3 years ago

Hi Adina,

Sorry for the delayed response! I'm still getting the same error when I update stella with the command you gave above! I'm not sure if it has something to do with tensorflow or my setup even. I've tried in both python 3.6 and 3.7.

For 3.7 my versions are: tensorflow==2.4.1, lightkurve==2.0.9, numpy==1.19.5, scipy==1.6.0 It also looks like calling pip install stella updated some of these!

Sorry for the issues!

Thanks, Lauren

pjstanley1 commented 3 years ago

Hello again,

I reproduced a similar error that was caused by nan values showing up in ds.train_data. The error is:

InvalidArgumentError: assertion failed: [predictions must be >=0] [Condition x>= y did not hold element-wise:] [x (sequential/dense_1/Sigmoid:0) = ] [[0.497978717][0.520682096][0.519788146]...] [y (Cast_8/x:0) = ] [0] [[{{node assert_greater_equal/Assert/AssertGuard/else/_1/assert_greater_equal/AssertGuard/Assert}}]] [0p: __inference_train_function_12245] Function call stack: train_function

I traced it back to the light curve download. Some of the flux values were nan. I wrote a script that removed these nans and that seemed to fix this error. Another thing of note, if len(ds.train_data) == len(ds.train_labels) is true then the input training matrices are the same size. Lauren, I hope this helps!

aniruddha7599 commented 2 years ago

Hello @pjstanley1 ,

Just wanted to know how did you clear all the nan values from .npy files of lightkurve in one go. I downloaded all the lightkurve using the inbuilt download function in Stella package itself . I was also trying to reproduce the results for stella from the tutorial provide in the git folder and also encountered the same error.

Thanks.

ansony1691 commented 1 year ago

Hello @pjstanley1 I join the request of @aniruddha7599

pjstanley1 commented 1 year ago

Hello @aniruddha7599 and @ansony1691,

lightkurve has a function to remove nans from a light curve, but you can also use the pandas drop nans

Hope this helps!

afeinstein20 / stella

Training a Model #13