Atlases in nitorch - Githubissues

brudfors commented 3 years ago

It would be good to have access to some atlases in the nitorch package; initially, I was thinking of a few MR contrasts. Currently I am using MB on the IXI dataset to generate such atlases. This gives quite nice T1, T2 and PD weighted atlases (and T2 and PD has neck, to boot!).

The atlases are in 1 mm space. I created them by, for each subject, for each modality:

Picking the template space INU corrected intensity image (with predicting missing voxels enabled).
Adding it to an atlas image, and also adding the non-zero, finite voxels to a count image.
Dividing the atlas image with the count image (avoiding zero division).
Normalise between 0 and 1, by dividing with max intensity value.
Make uint8.

@balbasty , my thoughts/questions are:

How should we include this data in the repo? Just add straight into the GitHub account? Each atlas is around 20 mb, not zipped.
What access should we give to other people who wants to use the atlases? Should we publish them somewhere? Maybe if you are writing up some paper you could publish the atlases with the paper?
Should I put them in MNI space?
So far I have used 218 IXI subjects (because that is what I had uploaded, at the time, to the FIL machine). Maybe I should build one using the entire IXI dataset?
Should I also include its MRA data?
A CT atlas would be cool as well, but then the issue is to find good data.

balbasty commented 3 years ago

They look pretty good!

How should we include this data in the repo? Just add straight into the GitHub account? Each atlas is around 20 mb, not zipped.

GitHub has a 100MB file limit, so these files would fit. I don't know if it's the best thing to do long term, though.
We could also use figshare (20GB of private space per repo, unlimited public space, max 5TB per file), and have nitorch download the data on-demand.
If we do not use github, we can either download it from the code when needed, or download it from the setup script. Maybe the latter is nicer? (we could use the extras options to let the user choose between an install with or without data)

What access should we give to other people who want to use the atlases? Should we publish them somewhere? Maybe if you are writing up some paper you could publish the atlases with the paper?

I imagine that they should be public or it would be a bit useless. We might struggle to publish them on their own, indeed...

Should I put them in MNI space?

I would say yes, at least rigidly. Don't know about the scaling. Maybe we can use qform/sform to store two different intents?

So far I have used 218 IXI subjects (because that is what I had uploaded, at the time, to the FIL machine). Maybe I should build one using the entire IXI dataset?

Would it make a massive difference?

Should I also include its MRA data?

What's MRA?

A CT atlas would be cool as well, but then the issue is to find good data.

Yes. I don't have a solution for that. We could ask PF, he might be up for it (and has lots of head/neck CTs)

Final thought: it would be nice to also have skull-stripped versions :)

brudfors commented 3 years ago

If we do not use github, we can either download it from the code when needed, or download it from the setup script. Maybe the latter is nicer? (we could use the extras options to let the user choose between an install with or without data)

Hosting it somewhere, with a download option in the setup file, sounds like the best option I agree. Putting it directly on GitHub is risky, as it might grow to take up more than 100 MB.

I imagine that they should be public or it would be a bit useless. We might struggle to publish them on their own, indeed...

Okay.

I would say yes, at least rigidly. Don't know about the scaling. Maybe we can use qform/sform to store two different intents?

I will make them MNI then.

What is MRA?

Magnetic Resonance Angiography (MRA), shows vessels, the IXI dataset also has MRA for each subject.

Yes. I don't have a solution for that. We could ask PF, he might be up for it (and has lots of head/neck CTs)

Yes, let's see if I find the energy to ask him.

Final thought: it would be nice to also have skull-stripped versions :)

Good idea, not sure how to best do the skull-stripping though. The model was trained unsupervised, so might be hard to do good skull-stripping using the template, I will have a think.

brudfors commented 3 years ago

Below is code that downloads and unzips the nitorch data. My idea is that we have a folder in the top directory called data, where all atlases, etc. are stored. Do you know how to give a flag to setup.py, that if true, runs the below code? I have found bits and pieces via Google, but wanted to know if you already had some ideas.

import os
import wget
import gzip
import pathlib
import shutil

# nitorch data directory
dir_data = os.path.join(cdir,'data')

# Stores nitorch URLs and filenames
data = {}
data['atlas_t1'] = ['https://ndownloader.figshare.com/files/25438340','mb_mni_avg218T1.nii.gz']
data['atlas_t2'] = ['https://ndownloader.figshare.com/files/25438343','mb_mni_avg218T2.nii.gz']
data['atlas_pd'] = ['https://ndownloader.figshare.com/files/25438337','mb_mni_avg218PD.nii.gz']

def get_nitorch_data():
    # Directory of this file
    cdir = pathlib.Path(__file__).parent.absolute()

    # Data directory    
    pathlib.Path(dir_data).mkdir(parents=True, exist_ok=True)

    # wget progress bar
    def bar(current, total, width=80):
        print("Downloading: %d%% [%d / %d] bytes" % (current / total * 100, current, total))

    # Download and gunzip nitorch data
    for k in data:
        url = data[k][0]
        pgz = os.path.join(dir_data, data[k][1])
        pnii = pgz[:-3]
        if not os.path.exists(pgz):
            wget.download(url, pgz, bar=bar)
        if os.path.exists(pgz):        
            with gzip.open(pgz, 'rb') as f_in:            
                with open(pnii, 'wb') as f_out:
                    shutil.copyfileobj(f_in, f_out)
            os.remove(pgz)

balbasty commented 3 years ago

Just found this package that seems quite suited to what we need: https://pypi.org/project/pooch/

brudfors commented 3 years ago

@balbasty , that does look interesting. Do you theirs is a good approach, as to where to store the data:

# Folder where the data will be stored. For a sensible default, use the
# default cache folder for your OS.
path=pooch.os_cache("mypackage"),

balbasty / nitorch

Atlases in nitorch #24