Closed RodenLuo closed 9 months ago
Hi @RodenLuo , do you have the metafile at the path meta/metadata.csv for your data? That file should follow the example shown here.
Hi Frosina @frosinastojanovska, Thanks very much for your quick response! I don't have my data yet. I'm trying to reproduce and learn the pipeline, especially the training part. Is it possible that I do so with the dataset you made public here https://www.nature.com/articles/s41592-022-01746-2#data-availability? If so, which one should I download, and how should I structure the input data folder? Many thanks.
Hi @RodenLuo yes, you can use the available tomogram from the dataset. Download one tomogram from there and set the local path of it in the metadata file. The tomograms are here:
The example metadata.csv looks like
data,labels,flip_y,id
meta/tomo_1.mrc,meta/tomo_1_labels.mrc,0,id_123
meta/tomo_2.rec,meta/tomo_2_anno.mrc,1,id_456
I get the data column, which I believe should be set to the file names under tomograms
, such as TS_0001.rec
. I also notice under folder labels
, there are files such as TS_0001_FAS.mrc
. But then there are also TS_0001_cytosol.mrc
.
Should I put them as two records in the following way? And as there are more labels, should I put each label as a new line?
data,labels,flip_y,id
meta/TS_0001.rec,meta/TS_0001_FAS.mrc
meta/TS_0001.rec,meta/TS_0001_cytosol.mrc
And then, how should I set flip_y
and id
columns?
Many thanks!
Hi Roden @RodenLuo , for prediction you only need the data field, so your metafile should look like this:
data
meta/TS_0001.rec
If you want to also train, then the labels should be provided as well in one .mrc file. Here the 2D CNN is used for the cellular compartments' predictions and if you want to train this from scratch, you should provide the ground truth label mrc file. Otherwise, you only need the tomogram for prediction with already trained model. FAS, ribosomes and membrane can be predicted with the 3D version of the CNN.
Having a tomogram and a trained model is enough for prediction on that tomogram, so I assume you already have the trained model downloaded as well.
Hi Frosina, Thanks very much for the explanations! I'm also very much interested in the training to learn the whole pipeline. If you can give some hints about how to set the folder structure and the metadata.csv for training from scratch for both 2D CNN and 3D CNN, that would be great. Thanks!
Hi Roden, for training the 2d CNN you can set up the metadata as the provided example using the labels with the file that ends with organelles. The flip y indicates if you need to flip the y-axis to match with the tomogram (sometimes different software produces different output with y flip, so just check if the ground truth is overlapping in your eye with the tomogram, or if it needs the flipping). For training the 3d CNN, again you can use the provided metadata file, where the labels are in one file (for example ribosomes). If you want to train with more than one class per time, here you need to have labels for more classes in only one file as ground truth, (for example you have to combine the ribosome and FAS classes into one labels file). I hope this helps.
Thanks very much, Frosina! Will take on from here.
Hi, I followed the instructions to install and test the repo. After installation, I faced the error below. In line 2 of
2d_cnn/config.yaml
, it mentionstraining_data: meta/metadata.csv
. It seems that this file is not in the repo. Could you please kindly help here? Thanks!