dhammack / DSB2017

Code for 2nd place solution to the 2017 National Data Science Bowl
346 stars 156 forks source link

data generator failure #7

Open DongJT1996 opened 7 years ago

DongJT1996 commented 7 years ago

Hi Daniel,

Last time I sent an email to you to report the problem. I tried to run the /DSB2017-master/training_code/aws/nodule_des_v37b.py file. But there is an error like this: File "nodule_des_v37b.py", line 121 ,in get_generator_static ixs1 = np.random.choice(range(X1.shape[0]),size=n1,replace=False) ValueError: Cannot take a larger sample than population when 'replace=False'

You gave me the suggestion that the data didn't generate correctly. But about the file /DSB2017-master/training_code/aws/data_generator_fn3.py, I just modified the corresponding directory . I guess if the file being used has any problem. annotations_enhanced.csv I use the file in DSB2017-master/training_code/DLung candidates_V2.csv Because I did not find this file in your folder, I used this candidates_V2.csv file in the sources provided by Julian. And I don't know whether there is a problem. LUNA I just use the whole data set of LUNA16.

Secondly, I have some confusion about these ‘None‘. image

Thirdly, about the error : File "nodule_des_v37b.py", line 121 ,in get_generator_static ixs1 = np.random.choice(range(X1.shape[0]),size=n1,replace=False) ValueError: Cannot take a larger sample than population when 'replace=False'

When I change the replace=True, it gets following results: image There are also some 'None' and 'MetaImage: M_ReadElementsData: data not read completely'. I don't kow what's wrong.

dhammack commented 7 years ago

Hi!

The "None" outputs are from the data generator. It prints None whenever it finishes regenerating the data. I could turn this off but it never hurt me so I kept it in.

The fact that it is printing None so often in your first picture tells me that it is not regenerating the whole dataset. I found that each data generation step typically took longer than an epoch of training.

Can you run the data generator by itself [import it and call data_generator_fn.main()] and let me know what the size is of the generated dataset and any errors that you see? I suspect it generating an empty or nearly empty dataset.