dorianps / LINDA

Lesion Identification with Neighborhood Data Analysis
Apache License 2.0
20 stars 4 forks source link

Training LINDA for small lessons #29

Open jct0024 opened 4 years ago

jct0024 commented 4 years ago

Good morning, I am in a neuroimaging project, where LINDA could be essential. For this I would like to be able to train LINDA with small injuries. I was reading the previous issue but I didn't quite understand how it would be done. I don't know if you could explain to me where the relevant functions are. Kind regards, Jesus.

dorianps commented 4 years ago

The script published on the other issue is supposed to be a starting point, but you will need to understand a bit of R and the concepts behind it. At the end, you need a matrix with one row for each voxel. The columns need to be the features that you think can help to find out if that voxel is lesioned. That matrix is input to a random forest model to train. LINDA has also a step-up refinement from low to higher resolution, where previous predictions are input as additional features. Because of this, the somewhat larger lesions I used trained the model to rely more on the low resolution pick-up of the lesion. That what you can try to improve.

jct0024 commented 4 years ago

From R I understand, but I do not understand very well the matrix of the characteristics, or where this is found in the program. I will try to study the previous issue thoroughly, to see if I can understand it well.

jct0024 commented 4 years ago

Hello Dorio, i have been studying the previous issue, and the functions: getLesionFeatures, mrvnrfs_predict_chunks and the model; And I have a series of doubts that I cannot understand. Starting with the model, I understand that it receives 6 characteristics to train the model, these characteristics are supposed to be delivered by "getLesionFeatures", but I can't understand where it "collects" them. If I wanted to enlarge the model with images of small lesions that have been made available to me, I must pass the image T1 and the mask to getLesionFeatures, is it the structural one or that of the lesion? Then truncate those data to give the 6 characteristics, if I am not mistaken, up to here I do not modify anything, I would only pass the data. But how can I change the model so that LINDA is "updated". I hope I explained myself well, and above all, thank you very much for your time and your responses. If you wish, I will also inform you of any progress you have, so that together we can make LINDA the same precision for small injuries as for large ones. Again, thank you very much. Sincerely, Jesus.

dorianps commented 4 years ago

Check this updated file. I added some more comments to the code to help you. PublishablePennModel_extraComments.zip

getLesionFeatures is a function, and comes with Linda. You can copy it from the repository here. It will take the 3 inputs (images) you see there, and will give you the 6 outputs/images/features you need for LINDA.

jct0024 commented 4 years ago

Doria, The first thing is to thank you for all the help you are giving me, thanks to you I have understood LINDA much better and I more or less know how it works. I have to tell you that I am a student in medical engineering and all this is something new for me and I am somewhat lost, so I apologize in advance for the work that may bother you. Next I am going to ask you a series of questions, to try to clarify myself without having to take up too much time.

  1. In the model, the characteristics are taken from "getLesionFeatures" but where does this function "leave" the characteristics? I've seen that there is a featX folder for each feature, but I don't understand where exactly that folder comes from. (Other than that the features are created by getLesionFeatures).
  2. I do not understand what the features should be ready in the template space and where it comes from "_to_pennTemplate_lesion.nii.gz", "antsMalfLabels_6class.nii.gz" and "templateBrainMask.nii.gz".
  3. Another question I have, is if I train the model with my data, do you lose the part of the model that is with yours? That is, if I pass small injuries, is its accuracy maintained for large injuries?
  4. From what I have seen and read, I run "getLesionFeatures" with all my images. Correct me if I'm wrong, and this returns the characteristics for each one. (Here would also enter the first question of where these characteristics go).
  5. Finally, the template calls "mrvnrfs_chunks.R", but inside the gishub is "mrvnrfs_predict_chunks.R" this is where I get lost the most. Within this, once mrvnrfs is executed the model would be created and LINDA would be ready, right?

Again, thank you very much for your time and help, if you are interested, I could document and inform you of the progress of the project, because if we make it work for small injuries, LINDA would be a practically perfect application. I would be happy to help you if I could.

dorianps commented 4 years ago

Doria, The first thing is to thank you for all the help you are giving me, thanks to you I have understood LINDA much better and I more or less know how it works. I have to tell you that I am a student in medical engineering and all this is something new for me and I am somewhat lost, so I apologize in advance for the work that may bother you. Next I am going to ask you a series of questions, to try to clarify myself without having to take up too much time.

  1. In the model, the characteristics are taken from "getLesionFeatures" but where does this function "leave" the characteristics? I've seen that there is a featX folder for each feature, but I don't understand where exactly that folder comes from. (Other than that the features are created by getLesionFeatures).

The output is returned as a list of antsImages. They are objects you can manipulate in R. They can be saved as nifti files with antsImageWrite

  1. I do not understand what the features should be ready in the template space and where it comes from "_to_pennTemplate_lesion.nii.gz", "antsMalfLabels_6class.nii.gz" and "templateBrainMask.nii.gz".

Have the T1w registered in template space, then pass to getLesionFeatures, and they are all in the same template space. The template we use is the Penn template which comes with Linda, but you can use any template. The 6class label is only a sampling mask used by mrvnrfs to know where to pick the samples from. I see that is not online with Linda but you can either run 6 tissue segmentation or I can try to find it for you.

  1. Another question I have, is if I train the model with my data, do you lose the part of the model that is with yours? That is, if I pass small injuries, is its accuracy maintained for large injuries?

Hmmm, in principle yes. Your model will be trained with your lesions. In theory one can concatenate models, or think ways to build a model with two parallel random forest predictions. That would be interesting extension, but it is quite a feat to try that and would need deeper knowledge on how Linda works.

  1. From what I have seen and read, I run "getLesionFeatures" with all my images. Correct me if I'm wrong, and this returns the characteristics for each one. (Here would also enter the first question of where these characteristics go).

Yes, you run that in a for loop.

  1. Finally, the template calls "mrvnrfs_chunks.R", but inside the gishub is "mrvnrfs_predict_chunks.R" this is where I get lost the most. Within this, once mrvnrfs is executed the model would be created and LINDA would be ready, right?

Yes.

jct0024 commented 4 years ago

Dorio, im sorry but i have two question more:

dorianps commented 4 years ago

Not sure I follow, but yes, you need the lesion masks too. As I mentioned before, LINDA is trained on a 4 tissue segmentation image which improves the detection of the lesion. You run 3 tissue segmentation with atropos (I think that's what you get as one of the features with getLesionFeatures), then mark the lesioned voxel with value 4, that's your training image.

mrvnrf = train the model mrvnrf_predict = predict new cases

Back at the time, there wasn't enough memory to train the model all at once with all voxels, so it was done in chunks of about 5000 voxels at a time. This is why you have chunks.

Lastly, to train for smaller lesions you need to avoid downsampling everything at 2mm as I did. You really want to have more spatial resolution to capture small lesions.