Exception in fitting due to regions in json hierarchy but not in the annotations volume

drodarie commented 8 months ago

On master branch, when running the cell atlas pipeline with the CCFv3 annotation atlas, the following command:

atlas-densities cell-densities fit-average-densities                                            \
    --hierarchy-path=data/1.json                                                                \
    --annotation-path=data/ccfv3/annotation_25.nrrd                                             \
    --neuron-density-path=data/ccfv3/density_volumes/neuron_density.nrrd                        \
    --average-densities-path=data/ccfv3/measurements/lit_densities.csv                          \
    --homogenous-regions-path=data/ccfv3/measurements/homogeneous_regions.csv                   \
    --gene-config-path=atlas_densities/app/data/markers/fit_average_densities_ccfv2_config.yaml \
    --fitted-densities-output-path=data/ccfv3/first_estimates/first_estimates.csv               \
    --fitting-maps-output-path=data/ccfv3/first_estimates/fitting.json

This fails with the following exception:

Traceback (most recent call last):
  File "Workspace/venv/bin/atlas-densities", line 8, in <module>
    sys.exit(cli())
  File "Workspace/abt/atlas-densities/atlas_densities/app/cli.py", line 24, in cli
    app()
  File "Workspace/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "Workspace/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "Workspace/venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "Workspace/venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "Workspace/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "Workspace/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "Workspace/venv/lib/python3.10/site-packages/atlas_commons/app_utils.py", line 47, in wrapper
    function(*args, **kw)
  File "Workspace/abt/atlas-densities/atlas_densities/app/cell_densities.py", line 906, in fit_average_densities
    fitted_densities_df, fitting_maps = linear_fitting(
  File "Workspace/abt/atlas-densities/atlas_densities/densities/fitting.py", line 624, in linear_fitting
    _check_average_densities_sanity(average_densities)
  File "Workspace/abt/atlas-densities/atlas_densities/densities/fitting.py", line 514, in _check_average_densities_sanity
    raise AtlasDensitiesError(
atlas_densities.exceptions.AtlasDensitiesError: `average_densities` has a NaN measurement or a NaN standard deviation for the following entries:                                  brain_region  ...       specimen_age
3050  Entorhinal area, lateral part, layer 2a  ...  8- to 14-week-old
3051  Entorhinal area, lateral part, layer 2b  ...  8- to 14-week-old

These lines are part of the lit_densities.csv and contain no mean values. These lines have been generated when creating the lit_densities.csv (command atlas-densities cell-densities measurements-to-average-densities) from gaba_papers.xlsx.

The regions associated with these lines exist in the brain region hierarchy but do not appear in the CCFv3 annotation atlas:

import numpy as np
from voxcell import RegionMap, VoxelData
region_map = RegionMap.load_json("data/1.json")
annotation = VoxelData.load_nrrd("data/ccfv3/annotation_25.nrrd").raw

# Same for the other layer.
region_map.find("Entorhinal area, lateral part, layer 2a", "name")  # {715}
region_map.is_leaf_id(715)  # True
np.count_nonzero(annotation==715)  # 0

The regions having no volume, every literature values based on them will be stored as NaNs in the final lit_densities.csv.

The question is when/how should we solve this issue:

When preparing the literature data? Every literature value that do not appear in the annotations should not be stored in the literature file (test with a set of unique ids from the annotations)
When writing in the literature file? Lines with NaNs should not be stored.
When reading the literature file during fitting? Lines with NaNs should be ignored.

mgeplf commented 8 months ago

The question is when/how should we solve this issue:

When preparing the literature data? Every literature value that do not appear in the annotations should not be stored in the literature file (test with a set of unique ids from the annotations)

When writing in the literature file? Lines with NaNs should not be stored.

When reading the literature file during fitting? Lines with NaNs should be ignored.

To me the best fit is that everything is consistent; if is a mismatch between the any of the datasets, it should fail until the inputs can be corrected - that forces someone to look at why it's failing, rather than missing it in a log message if something either not stored or ignored.

drodarie commented 8 months ago

Currently, when regions in the literature files do not appear in the hierarchy, they are simply ignored (no warning): See measurement_to_average_density in atlas_densities.densities.measurement_to_density. The user should know about them too. I will suggest a patch for that too. IMHO, I find that raising an error is a bit too strong, I would prefer to raise a warning to the user for each region ignored. However, if you think that it is a better practice, I'll follow you.

drodarie commented 8 months ago

Especially because some files that come directly from papers (e.g. Kim et al. data) have values with no current equivalent in the hierarchy. This implies that specific data sheets have to be created for each version of the atlas that use a different version of the hierarchy or annotation.

Sebastien-PILUSO commented 8 months ago

I suggest to clearly identify the region ensemble defined by hierarchy ∩ annotation from the beginning and only write those regions in the CSV/JSON files produced by the atlas_densities pipeline. A WARNING could be raised to inform the user that such regions in the hierarchy AND the annotation are not considered by the pipeline, as:

WARNING: NoRegionFound: The following regions are ignored because they do not belong to the hierarchy AND the annotation file given as input:
[regionName1, regionName2, ..., regionNameN]

mgeplf commented 8 months ago

IMHO, I find that raising an error is a bit too strong, I would prefer to raise a warning to the user for each region ignored.

Ok, that works too, I'm fine with that if you think it's better, I trust your judgment.

Sebastien-PILUSO commented 8 months ago

The changes were tested, many WARNING messages were raised, as expected. This issue is now corrected.

mgeplf commented 8 months ago

@Sebastien-PILUSO > This issue is now corrected.

Great, thanks for checking. I will now merge.

BlueBrain / atlas-densities

Exception in fitting due to regions in json hierarchy but not in the annotations volume #60