ZauggGroup / DeePiCt

Pipeline for the automatic detection and segmentation of particles and cellular structures in 3D Cryo-ET data, based on deep learning (convolutional neural networks).
Apache License 2.0
29 stars 8 forks source link

metadata.csv missing #19

Closed RodenLuo closed 9 months ago

RodenLuo commented 9 months ago

Hi, I followed the instructions to install and test the repo. After installation, I faced the error below. In line 2 of 2d_cnn/config.yaml, it mentions training_data: meta/metadata.csv. It seems that this file is not in the repo. Could you please kindly help here? Thanks!

$ bash 2d_cnn/deploy_local.sh 2d_cnn/config.yaml 
FileNotFoundError in line 39 of /home/luod/DeePiCt/2d_cnn/snakefile.py:
[Errno 2] No such file or directory: 'meta/metadata.csv'
  File "/home/luod/DeePiCt/2d_cnn/snakefile.py", line 39, in <module>
  File "/home/luod/miniconda3/envs/deepict/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
  File "/home/luod/miniconda3/envs/deepict/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
  File "/home/luod/miniconda3/envs/deepict/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
  File "/home/luod/miniconda3/envs/deepict/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
  File "/home/luod/miniconda3/envs/deepict/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
  File "/home/luod/miniconda3/envs/deepict/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in __init__
  File "/home/luod/miniconda3/envs/deepict/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 229, in _open_handles
  File "/home/luod/miniconda3/envs/deepict/lib/python3.7/site-packages/pandas/io/common.py", line 707, in get_handle
frosinastojanovska commented 9 months ago

Hi @RodenLuo , do you have the metafile at the path meta/metadata.csv for your data? That file should follow the example shown here.

RodenLuo commented 9 months ago

Hi Frosina @frosinastojanovska, Thanks very much for your quick response! I don't have my data yet. I'm trying to reproduce and learn the pipeline, especially the training part. Is it possible that I do so with the dataset you made public here https://www.nature.com/articles/s41592-022-01746-2#data-availability? If so, which one should I download, and how should I structure the input data folder? Many thanks.

frosinastojanovska commented 9 months ago

Hi @RodenLuo yes, you can use the available tomogram from the dataset. Download one tomogram from there and set the local path of it in the metadata file. The tomograms are here: image

RodenLuo commented 9 months ago

The example metadata.csv looks like

data,labels,flip_y,id
meta/tomo_1.mrc,meta/tomo_1_labels.mrc,0,id_123
meta/tomo_2.rec,meta/tomo_2_anno.mrc,1,id_456

I get the data column, which I believe should be set to the file names under tomograms, such as TS_0001.rec. I also notice under folder labels, there are files such as TS_0001_FAS.mrc. But then there are also TS_0001_cytosol.mrc.

Should I put them as two records in the following way? And as there are more labels, should I put each label as a new line?

data,labels,flip_y,id
meta/TS_0001.rec,meta/TS_0001_FAS.mrc
meta/TS_0001.rec,meta/TS_0001_cytosol.mrc

And then, how should I set flip_y and id columns?

Many thanks!

frosinastojanovska commented 9 months ago

Hi Roden @RodenLuo , for prediction you only need the data field, so your metafile should look like this:

data
meta/TS_0001.rec

If you want to also train, then the labels should be provided as well in one .mrc file. Here the 2D CNN is used for the cellular compartments' predictions and if you want to train this from scratch, you should provide the ground truth label mrc file. Otherwise, you only need the tomogram for prediction with already trained model. FAS, ribosomes and membrane can be predicted with the 3D version of the CNN.

Having a tomogram and a trained model is enough for prediction on that tomogram, so I assume you already have the trained model downloaded as well.

RodenLuo commented 9 months ago

Hi Frosina, Thanks very much for the explanations! I'm also very much interested in the training to learn the whole pipeline. If you can give some hints about how to set the folder structure and the metadata.csv for training from scratch for both 2D CNN and 3D CNN, that would be great. Thanks!

frosinastojanovska commented 9 months ago

Hi Roden, for training the 2d CNN you can set up the metadata as the provided example using the labels with the file that ends with organelles. The flip y indicates if you need to flip the y-axis to match with the tomogram (sometimes different software produces different output with y flip, so just check if the ground truth is overlapping in your eye with the tomogram, or if it needs the flipping). For training the 3d CNN, again you can use the provided metadata file, where the labels are in one file (for example ribosomes). If you want to train with more than one class per time, here you need to have labels for more classes in only one file as ground truth, (for example you have to combine the ribosome and FAS classes into one labels file). I hope this helps.

RodenLuo commented 9 months ago

Thanks very much, Frosina! Will take on from here.