Incorrect session dates for subjects

botsunny commented 7 months ago

Hi there! Thank you for this repository.

I am playing around with the inference code when I realised that the participants.tsv file in the dataset downloaded from OpenNeuro contains session dates that do not align with all_test_subs_inhouse.pkl in the extra_files directory of this repository.

As seen above, the session date for sub-000 in the OpenNeuro dataset is 20110101 but in the given pkl file, it is 20101230. Same goes for the other pkl files in extra_files such as patients_with_voxelwise_labels.pkl, patients_with_weak_labels.pkl and the pkl files in Forced_CV_for_reproducibility.

Therefore, after running the inference step of the pipeline as described in the README, using the available train outputs/weights provided in extra_files, inference results were produced for only 6 patients across the 5 folds, as majority of the testing subject-sessions in each fold does not exist in the OpenNeuro dataset at all.

I tried running the pipeline from step 1 and training the network myself, but the result is the same as Forced_CV_for_reproducibility is used to restore the subject-session names.

(Notice how the session date for sub-005 in the output of self_trained is different from that in participants.tsv)

Are the contents of extra_files currently outdated with the OpenNeuro dataset? I am not super experienced with this and have not engaged with a project of this scale, so apologies if I am misunderstanding anything!

tommydino93 commented 7 months ago

Hi @botsunny, Thank you for your interest here and for pointing out the inconsistency! You are right, the sessions don't match. The ones from the github repo are outdated. You should use the ones from OpenNeuro (from the dataset itself and from the participants.tsv). However, the subjects should match, so for instance the subjects you find on all_test_subs_inhouse.pkl are correct, though you need to change the sessions. I currently don't have much time now to correct this. If you manage to create a .pkl file with the correct sessions it would be super useful so I can upload it in the github repo :) Leaving this open for now as a reminder for the future

botsunny commented 6 months ago

Hi there, thank you for the reply!

I had some time to update the session dates within the .pkl files and it seems to be working (with the exception of a couple subjects which are no longer available in the OpenNeuro dataset), though I am running into an (possibly unrelated) issue where tf.image.per_image_standardization of the sliding window fails on the patches of some subjects due to "invalid dimension reduction". I will attempt to confirm if the .pkl files are responsible for this before sharing.

Additionally, on an unrelated question, does the model always produce results even if no aneurysms are detected? Would the resulting nii.gz masks simply be empty? Do let me know if I should open another issue for this question.

Thanks again!

tommydino93 commented 6 months ago

Ok, please let me know if the problem persists after checking the .pkl files.

Regarding your second question, YES the model returns an empty array for patients without aneurysms

connectomicslab / Aneurysm_Detection

Incorrect session dates for subjects #6