Closed FrancescoCasalegno closed 2 years ago
This issues seems to be related to the fact that different feature extraction methods produce a different number of output cells for the same dataset. That is, feature extraction probably fails for some morphologies.
After fixing some bugs (see in particular #73), we now have only two datasets where we can see different number of cells. Indeed, running (on feed8f456499b0cc1c234e000054a294d23668ef) the following snippet (run inside dvc/extract-features
)
print("dataset n_cells")
print("----------------------------------------------")
for dataset in sorted(Path.cwd().glob("*")):
n_cells = None
ss = set()
for dendrite in sorted(dataset.glob("*")):
for method in sorted(dendrite.glob("*")):
if method.is_file():
continue
n_cells_new = len(sorted(method.glob("*")))
ss.add(n_cells_new)
ss = ", ".join(str(s) for s in sorted(ss))
print(f"{str(dataset.name):<25s} {ss}")
gives the following output
dataset n_cells
----------------------------------------------
in-L1 105
in-L23 156
in-L4 113
in-L5 111
in-L6 62
lida-in-merged 423
lida-in-merged-bc-merged 423
lida-janelia-L5 58
pc-L2 40, 41, 43
pc-L3 44
pc-L4 89
pc-L5 160
pc-L6 125, 128, 129
More specifically, here's what we see for pc-L2
and pc-L6
:
for dataset in ["pc-L2", "pc-L6"]:
print("-------", dataset, "-------")
p = Path("/workdir/dvc/extract-features") / dataset
for dendrite in sorted(p.glob("*")):
for method in sorted(dendrite.glob("*")):
if method.is_file():
continue
n_cells = len(list(method.glob("*")))
print(f"[{n_cells}] {method}")
print()
------- pc-L2 -------
[43] /workdir/dvc/extract-features/pc-L2/all/diagram-deepwalk
[43] /workdir/dvc/extract-features/pc-L2/all/diagram-tmd-proj
[43] /workdir/dvc/extract-features/pc-L2/all/graph-proj
[43] /workdir/dvc/extract-features/pc-L2/all/image-deepwalk
[43] /workdir/dvc/extract-features/pc-L2/all/image-tmd-proj
[43] /workdir/dvc/extract-features/pc-L2/apical/diagram-deepwalk
[43] /workdir/dvc/extract-features/pc-L2/apical/diagram-tmd-proj
[43] /workdir/dvc/extract-features/pc-L2/apical/graph-proj
[43] /workdir/dvc/extract-features/pc-L2/apical/image-deepwalk
[43] /workdir/dvc/extract-features/pc-L2/apical/image-tmd-proj
[41] /workdir/dvc/extract-features/pc-L2/axon/diagram-deepwalk
[41] /workdir/dvc/extract-features/pc-L2/axon/diagram-tmd-proj
[41] /workdir/dvc/extract-features/pc-L2/axon/graph-proj
[41] /workdir/dvc/extract-features/pc-L2/axon/image-deepwalk
[40] /workdir/dvc/extract-features/pc-L2/axon/image-tmd-proj
[43] /workdir/dvc/extract-features/pc-L2/basal/diagram-deepwalk
[43] /workdir/dvc/extract-features/pc-L2/basal/diagram-tmd-proj
[43] /workdir/dvc/extract-features/pc-L2/basal/graph-proj
[43] /workdir/dvc/extract-features/pc-L2/basal/image-deepwalk
[43] /workdir/dvc/extract-features/pc-L2/basal/image-tmd-proj
------- pc-L6 -------
[129] /workdir/dvc/extract-features/pc-L6/all/diagram-deepwalk
[129] /workdir/dvc/extract-features/pc-L6/all/diagram-tmd-proj
[129] /workdir/dvc/extract-features/pc-L6/all/graph-proj
[129] /workdir/dvc/extract-features/pc-L6/all/image-deepwalk
[129] /workdir/dvc/extract-features/pc-L6/all/image-tmd-proj
[129] /workdir/dvc/extract-features/pc-L6/apical/diagram-deepwalk
[129] /workdir/dvc/extract-features/pc-L6/apical/diagram-tmd-proj
[129] /workdir/dvc/extract-features/pc-L6/apical/graph-proj
[129] /workdir/dvc/extract-features/pc-L6/apical/image-deepwalk
[129] /workdir/dvc/extract-features/pc-L6/apical/image-tmd-proj
[128] /workdir/dvc/extract-features/pc-L6/axon/diagram-deepwalk
[128] /workdir/dvc/extract-features/pc-L6/axon/diagram-tmd-proj
[128] /workdir/dvc/extract-features/pc-L6/axon/graph-proj
[128] /workdir/dvc/extract-features/pc-L6/axon/image-deepwalk
[125] /workdir/dvc/extract-features/pc-L6/axon/image-tmd-proj
[129] /workdir/dvc/extract-features/pc-L6/basal/diagram-deepwalk
[129] /workdir/dvc/extract-features/pc-L6/basal/diagram-tmd-proj
[129] /workdir/dvc/extract-features/pc-L6/basal/graph-proj
[129] /workdir/dvc/extract-features/pc-L6/basal/image-deepwalk
[128] /workdir/dvc/extract-features/pc-L6/basal/image-tmd-proj
Running stage 'features-pc-L2-diagram-deepwalk-axon':
> morphoclass -v extract-features data/final/pyramidal-cells/L2/dataset.csv axon diagram-deepwalk extract-features/pc-L2/axon/diagram-deepwalk
11:46:30 morphoclass.console.main (I) Running them morphoclass entrypoint
11:46:30 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
11:46:34 morphoclass.console.cmd_extract_features (I) Starting feature extraction
11:46:34 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
11:46:34 morphoclass.console.cmd_extract_features (I) Loading data
11:46:34 morphoclass.console.cmd_extract_features (E) Some morphologies had neurites with a total neurite node count less than 3. This is too little for feature extraction and we'll therefor remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L2/IPC/mtC110800E_idA.h5
* data/final/pyramidal-cells/L2/TPC_B/C090905B.h5
11:46:34 morphoclass.console.cmd_extract_features (I) Extracting features
11:46:39 morphoclass.console.cmd_extract_features (I) Setting the path attributes
11:46:39 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
11:46:39 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
To enable auto staging, run:
--
Running stage 'features-pc-L2-diagram-tmd-proj-axon':
> morphoclass -v extract-features data/final/pyramidal-cells/L2/dataset.csv axon diagram-tmd-proj extract-features/pc-L2/axon/diagram-tmd-proj
11:48:07 morphoclass.console.main (I) Running them morphoclass entrypoint
11:48:07 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
11:48:11 morphoclass.console.cmd_extract_features (I) Starting feature extraction
11:48:11 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
11:48:11 morphoclass.console.cmd_extract_features (I) Loading data
11:48:12 morphoclass.console.cmd_extract_features (E) Some morphologies had neurites with a total neurite node count less than 3. This is too little for feature extraction and we'll therefor remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L2/IPC/mtC110800E_idA.h5
* data/final/pyramidal-cells/L2/TPC_B/C090905B.h5
11:48:12 morphoclass.console.cmd_extract_features (I) Extracting features
11:48:12 morphoclass.console.cmd_extract_features (I) Setting the path attributes
11:48:12 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
11:48:12 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
To enable auto staging, run:
--
Running stage 'features-pc-L2-graph-proj-axon':
> morphoclass -v extract-features data/final/pyramidal-cells/L2/dataset.csv axon graph-proj extract-features/pc-L2/axon/graph-proj
11:49:37 morphoclass.console.main (I) Running them morphoclass entrypoint
11:49:37 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
11:49:41 morphoclass.console.cmd_extract_features (I) Starting feature extraction
11:49:41 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
11:49:41 morphoclass.console.cmd_extract_features (I) Loading data
11:49:41 morphoclass.console.cmd_extract_features (E) Some morphologies had neurites with a total neurite node count less than 3. This is too little for feature extraction and we'll therefor remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L2/IPC/mtC110800E_idA.h5
* data/final/pyramidal-cells/L2/TPC_B/C090905B.h5
11:49:41 morphoclass.console.cmd_extract_features (I) Extracting features
11:49:41 morphoclass.console.cmd_extract_features (I) Setting the path attributes
11:49:41 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
11:49:41 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
To enable auto staging, run:
--
Running stage 'features-pc-L2-image-deepwalk-axon':
> morphoclass -v extract-features data/final/pyramidal-cells/L2/dataset.csv axon image-deepwalk extract-features/pc-L2/axon/image-deepwalk
11:51:22 morphoclass.console.main (I) Running them morphoclass entrypoint
11:51:22 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
11:51:26 morphoclass.console.cmd_extract_features (I) Starting feature extraction
11:51:26 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
11:51:26 morphoclass.console.cmd_extract_features (I) Loading data
11:51:26 morphoclass.console.cmd_extract_features (E) Some morphologies had neurites with a total neurite node count less than 3. This is too little for feature extraction and we'll therefor remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L2/IPC/mtC110800E_idA.h5
* data/final/pyramidal-cells/L2/TPC_B/C090905B.h5
11:51:26 morphoclass.console.cmd_extract_features (I) Extracting features
11:51:31 morphoclass.console.cmd_extract_features (I) Converting diagrams to images
11:51:31 morphoclass.console.cmd_extract_features (I) Setting the path attributes
11:51:31 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
11:51:31 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
To enable auto staging, run:
--
Running stage 'features-pc-L2-image-tmd-proj-axon':
> morphoclass -v extract-features data/final/pyramidal-cells/L2/dataset.csv axon image-tmd-proj extract-features/pc-L2/axon/image-tmd-proj
11:53:01 morphoclass.console.main (I) Running them morphoclass entrypoint
11:53:01 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
11:53:05 morphoclass.console.cmd_extract_features (I) Starting feature extraction
11:53:05 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
11:53:05 morphoclass.console.cmd_extract_features (I) Loading data
11:53:05 morphoclass.console.cmd_extract_features (E) Some morphologies had neurites with a total neurite node count less than 3. This is too little for feature extraction and we'll therefor remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L2/IPC/mtC110800E_idA.h5
* data/final/pyramidal-cells/L2/TPC_B/C090905B.h5
11:53:05 morphoclass.console.cmd_extract_features (I) Extracting features
11:53:05 morphoclass.console.cmd_extract_features (I) Converting diagrams to images
11:53:05 morphoclass.console.cmd_extract_features (E) Some diagrams had fewer than 3 points. This is too few toto compute persistence images and we'll therefore remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L2/TPC_B/sm100617a1-4_idC.h5
11:53:06 morphoclass.console.cmd_extract_features (I) Setting the path attributes
11:53:06 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
11:53:06 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
Running stage 'features-pc-L6-diagram-deepwalk-axon':
> morphoclass -v extract-features data/final/pyramidal-cells/L6/dataset.csv axon diagram-deepwalk extract-features/pc-L6/axon/diagram-deepwalk
12:26:33 morphoclass.console.main (I) Running them morphoclass entrypoint
12:26:33 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
12:26:37 morphoclass.console.cmd_extract_features (I) Starting feature extraction
12:26:37 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
12:26:37 morphoclass.console.cmd_extract_features (I) Loading data
12:26:37 morphoclass.console.cmd_extract_features (E) Some morphologies had neurites with a total neurite node count less than 3. This is too little for feature extraction and we'll therefor remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L6/TPC_A/Fluo58_right.h5
12:26:37 morphoclass.console.cmd_extract_features (I) Extracting features
12:26:49 morphoclass.console.cmd_extract_features (I) Setting the path attributes
12:26:49 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
12:26:49 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
To track the changes with git, run:
--
Running stage 'features-pc-L6-diagram-tmd-proj-axon':
> morphoclass -v extract-features data/final/pyramidal-cells/L6/dataset.csv axon diagram-tmd-proj extract-features/pc-L6/axon/diagram-tmd-proj
12:28:28 morphoclass.console.main (I) Running them morphoclass entrypoint
12:28:28 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
12:28:32 morphoclass.console.cmd_extract_features (I) Starting feature extraction
12:28:32 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
12:28:32 morphoclass.console.cmd_extract_features (I) Loading data
12:28:33 morphoclass.console.cmd_extract_features (E) Some morphologies had neurites with a total neurite node count less than 3. This is too little for feature extraction and we'll therefor remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L6/TPC_A/Fluo58_right.h5
12:28:33 morphoclass.console.cmd_extract_features (I) Extracting features
12:28:33 morphoclass.console.cmd_extract_features (I) Setting the path attributes
12:28:33 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
12:28:33 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
To track the changes with git, run:
--
Running stage 'features-pc-L6-graph-proj-axon':
> morphoclass -v extract-features data/final/pyramidal-cells/L6/dataset.csv axon graph-proj extract-features/pc-L6/axon/graph-proj
12:30:05 morphoclass.console.main (I) Running them morphoclass entrypoint
12:30:05 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
12:30:09 morphoclass.console.cmd_extract_features (I) Starting feature extraction
12:30:09 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
12:30:09 morphoclass.console.cmd_extract_features (I) Loading data
12:30:10 morphoclass.console.cmd_extract_features (E) Some morphologies had neurites with a total neurite node count less than 3. This is too little for feature extraction and we'll therefor remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L6/TPC_A/Fluo58_right.h5
12:30:10 morphoclass.console.cmd_extract_features (I) Extracting features
12:30:10 morphoclass.console.cmd_extract_features (I) Setting the path attributes
12:30:10 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
12:30:10 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
To track the changes with git, run:
--
Running stage 'features-pc-L6-image-deepwalk-axon':
> morphoclass -v extract-features data/final/pyramidal-cells/L6/dataset.csv axon image-deepwalk extract-features/pc-L6/axon/image-deepwalk
12:31:33 morphoclass.console.main (I) Running them morphoclass entrypoint
12:31:33 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
12:31:37 morphoclass.console.cmd_extract_features (I) Starting feature extraction
12:31:37 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
12:31:37 morphoclass.console.cmd_extract_features (I) Loading data
12:31:38 morphoclass.console.cmd_extract_features (E) Some morphologies had neurites with a total neurite node count less than 3. This is too little for feature extraction and we'll therefor remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L6/TPC_A/Fluo58_right.h5
12:31:38 morphoclass.console.cmd_extract_features (I) Extracting features
12:31:48 morphoclass.console.cmd_extract_features (I) Converting diagrams to images
12:31:50 morphoclass.console.cmd_extract_features (I) Setting the path attributes
12:31:50 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
12:31:50 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
--
Running stage 'features-pc-L6-image-tmd-proj-axon':
> morphoclass -v extract-features data/final/pyramidal-cells/L6/dataset.csv axon image-tmd-proj extract-features/pc-L6/axon/image-tmd-proj
14:48:42 morphoclass.console.main (I) Running them morphoclass entrypoint
14:48:42 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
14:48:46 morphoclass.console.cmd_extract_features (I) Starting feature extraction
14:48:46 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
14:48:46 morphoclass.console.cmd_extract_features (I) Loading data
14:48:47 morphoclass.console.cmd_extract_features (E) Some morphologies had neurites with a total neurite node count less than 3. This is too little for feature extraction and we'll therefor remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L6/TPC_A/Fluo58_right.h5
14:48:47 morphoclass.console.cmd_extract_features (I) Extracting features
14:48:47 morphoclass.console.cmd_extract_features (I) Converting diagrams to images
14:48:47 morphoclass.console.cmd_extract_features (E) Some diagrams had fewer than 3 points. This is too few toto compute persistence images and we'll therefore remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L6/UPC/tkb060128_a1-a2_idD.h5
* data/final/pyramidal-cells/L6/TPC_A/tkb060510b2_ch5_ct_n_db_100x_1.h5
* data/final/pyramidal-cells/L6/IPC/C291101C2.h5
14:48:48 morphoclass.console.cmd_extract_features (I) Setting the path attributes
14:48:48 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
14:48:48 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
To track the changes with git, run:
--
Running stage 'features-pc-L6-image-tmd-proj-basal':
> morphoclass -v extract-features data/final/pyramidal-cells/L6/dataset.csv basal image-tmd-proj extract-features/pc-L6/basal/image-tmd-proj
15:30:26 morphoclass.console.main (I) Running them morphoclass entrypoint
15:30:26 morphoclass.console.cmd_extract_features (I) Loading modules and libraries
15:30:30 morphoclass.console.cmd_extract_features (I) Starting feature extraction
15:30:30 morphoclass.console.cmd_extract_features (I) Setting up pre-transforms
15:30:30 morphoclass.console.cmd_extract_features (I) Loading data
15:30:31 morphoclass.console.cmd_extract_features (I) Extracting features
15:30:32 morphoclass.console.cmd_extract_features (I) Converting diagrams to images
15:30:32 morphoclass.console.cmd_extract_features (E) Some diagrams had fewer than 3 points. This is too few toto compute persistence images and we'll therefore remove these morphologies from the dataset. Consider inspecting the data to find the cause. The morphologies to remove are:
* data/final/pyramidal-cells/L6/BPC/rp101228_L5-1_idA.h5
15:30:32 morphoclass.console.cmd_extract_features (I) Setting the path attributes
15:30:32 morphoclass.console.cmd_extract_features (I) Saving extracted features to disk
15:30:32 morphoclass.console.cmd_extract_features (I) Done.
Updating lock file 'dvc.lock'
Here are the problematic morphologies. Notice that we are displaying the final morphologies, but the raw ones look exactly the same. Plotted using Morphology Viewer.
Based on the results above https://github.com/BlueBrain/morphoclass/issues/68#issuecomment-1172346264 we can say the following.
raw
) as well as after (final
) the data prep.data/final/pyramidal-cells/L2/IPC/mtC110800E_idA.h5
— axon
has no bifurcationsdata/final/pyramidal-cells/L2/TPC_B/C090905B.h5
— axon
has no bifurcationsdata/final/pyramidal-cells/L6/TPC_A/Fluo58_right.h5
— axon
has no bifurcationsdata/final/pyramidal-cells/L2/TPC_B/sm100617a1-4_idC.h5
— axon
has only 1 bifurcationdata/final/pyramidal-cells/L6/UPC/tkb060128_a1-a2_idD.h5
— axon
has only 1 bifurcationdata/final/pyramidal-cells/L6/TPC_A/tkb060510b2_ch5_ct_n_db_100x_1.h5
— axon
has only 1 bifurcationdata/final/pyramidal-cells/L6/IPC/C291101C2.h5
— axon
has only 1 bifurcationdata/final/pyramidal-cells/L6/BPC/rp101228_L5-1_idA.h5
— basal
has only 1 bifurcation[^1]: On the other side, no error is raised when computing the TMD diagram! So we are doing this check, e.g., for image-tmd-proj
but not for diagram-tmd-proj
, see here:
https://github.com/BlueBrain/morphoclass/blob/ebd177df20ac49a482b1dda6466f82401534c669/src/morphoclass/console/cmd_extract_features.py#L235
@lidakanari
scipy.stats.gaussian_kde()
[^1] fails with the exception
ValueError: array must not contain infs or NaNs
while with 2 points it fails with the exception
LinAlgError: singular matrix
pyramidal-cells
morphologies when neurite type is axon
or basal
. This is not a real issue, because anyway we are anyway considering the default[^2] **neurite=apical
for pyramidal-cells
**.[^1]: This function is used by the TMD util tmd.Topology.analysis.get_persistence_image_data()
to compute the TMD Image
from the TMD Diagram
. And this TMD util is being called by our morphoclass extract-features
command.
[^2]: Because it's what works best. As a reminder, the default neurite=apical
is also used for janelia
(which are also pyramidal cells!) while neurite=axon
is used for interneurons
.
Context
When looking at the results of the
performance table
it looks like, for some datasets, thechance accuracy
depends on the choice of the neurite!So either there is a bug in how we compute
change accuracy
or theground truths
used for each experiment are different. After some inspection, it looks like this is indeed due to differentground truths
, because runninggives
Actions