Closed drodarie closed 8 months ago
The question is when/how should we solve this issue:
- When preparing the literature data? Every literature value that do not appear in the annotations should not be stored in the literature file (test with a set of unique ids from the annotations)
- When writing in the literature file? Lines with NaNs should not be stored.
- When reading the literature file during fitting? Lines with NaNs should be ignored.
To me the best fit is that everything is consistent; if is a mismatch between the any of the datasets, it should fail until the inputs can be corrected - that forces someone to look at why it's failing, rather than missing it in a log message if something either not stored or ignored.
Currently, when regions in the literature files do not appear in the hierarchy, they are simply ignored (no warning): See measurement_to_average_density
in atlas_densities.densities.measurement_to_density
.
The user should know about them too. I will suggest a patch for that too.
IMHO, I find that raising an error is a bit too strong, I would prefer to raise a warning to the user for each region ignored. However, if you think that it is a better practice, I'll follow you.
Especially because some files that come directly from papers (e.g. Kim et al. data) have values with no current equivalent in the hierarchy. This implies that specific data sheets have to be created for each version of the atlas that use a different version of the hierarchy or annotation.
I suggest to clearly identify the region ensemble defined by hierarchy ∩ annotation from the beginning and only write those regions in the CSV/JSON files produced by the atlas_densities
pipeline.
A WARNING could be raised to inform the user that such regions in the hierarchy AND the annotation are not considered by the pipeline, as:
WARNING: NoRegionFound: The following regions are ignored because they do not belong to the hierarchy AND the annotation file given as input:
[regionName1, regionName2, ..., regionNameN]
IMHO, I find that raising an error is a bit too strong, I would prefer to raise a warning to the user for each region ignored.
Ok, that works too, I'm fine with that if you think it's better, I trust your judgment.
The changes were tested, many WARNING messages were raised, as expected. This issue is now corrected.
@Sebastien-PILUSO > This issue is now corrected.
Great, thanks for checking. I will now merge.
On master branch, when running the cell atlas pipeline with the CCFv3 annotation atlas, the following command:
This fails with the following exception:
These lines are part of the
lit_densities.csv
and contain no mean values. These lines have been generated when creating thelit_densities.csv
(commandatlas-densities cell-densities measurements-to-average-densities
) fromgaba_papers.xlsx
.The regions associated with these lines exist in the brain region hierarchy but do not appear in the CCFv3 annotation atlas:
The regions having no volume, every literature values based on them will be stored as NaNs in the final
lit_densities.csv
.The question is when/how should we solve this issue: