How to fine-tune with my custom data?

tamasbalassa commented 2 years ago

Hello there! Thanks for this tool. Its not only great, but having a simple and clear description of everything makes it very convenient to use.

I am trying to achieve a model that works accurately on FISH data. So far I've tested the included smhfish1.h5 and also trained my own model file by using the annotation that I made on my custom data. My issue is that the included smhfish1.h5 model works better than the one Ive trained, however its still not superb, just acceptable.

So I was wondering if there is another (simple way) where I could refine the smhfish1.h5 weights to get a better result on my own data.

Also, is there a way or any built in method that is capable of measuring an average/overall accuracy (like calculating the mAP value)?

Thank you!

BBQuercus commented 2 years ago

Hey. Thanks for the feedback.

You can use the pre-trained model that you downloaded and just pass the path in to the training config (train_args -> pre_train).

For scoring we used the F1 integral score described in the paper which is accessible in python through deepblink.metrics.f1_integral. More easily though, you can use deepblink.metrics.compute_metrics which will list out everything a bit nicer.

tamasbalassa commented 2 years ago

Hi @BBQuercus. Thanks for the quick reply, I really appreciate it!

I've tried to use the mentioned pretrained network, but as it seems, the result are not getting better. With the default (0.5) threshold I was getting totally wrong results, see

Screenshot from 2022-03-01 17-11-50

Zooming into it:

Screenshot from 2022-03-01 17-12-01

When I set the threshold to 0.6, I am receiving less predicted dots, however they are not accurate at all:

Screenshot from 2022-03-01 17-09-15

I have nearly 5.000 training examples and these predictions were made on the training samples. However still, the downloaded model (mentioned above) works much better on them, than the refined one. Have you ever experienced something similar? Do you have maybe any ideas why is this happening?

Thanks!

zhanyinx commented 2 years ago

Hey, thanks for the feedbacks.

Can you share your dataset (or subset) and your config file for training?

Thanks

tamasbalassa commented 2 years ago

Hello @zhanyinx!

I've copied 4 images and their corresponding csvs (generated with the recommended TrackMate - I was taking care not to have false positives, but rather false negatives in the labelling set), together with the used config file to my google-drive. https://drive.google.com/drive/folders/1ZzXhtRDd5BsPWaEJ5A4mwJMuETtlohbq?usp=sharing I did not want to share my data publicly so please request there an access. Thanks!

BBQuercus commented 2 years ago

We had a look at your data and think to have found the issue. If you use raw microscopy data as input in some cases it'll save metadata on what the microns/pixel size is. TrackMate then uses that information and doesn't allow you to change it. Since deepBlink doesn't really care about the "real" size of stuff, the labels will be wrong if not everything is labeled in pixel-dimensions. The easiest way we found to remove this micron scale is in Fiji via Analyze -> Set scale... -> Click to remove scale before loading it into TrackMate. Alternatively you can also use python to load/save images (e.g. using skim age.io.imread/imsave).

Regarding your already labeled images – if you have the same sizes everywhere (i.e. not multiple magnifications), the easiest would be to load everything as data frame in pandas (let me know if you're familiar with pandas) and multiply the POSITION_X and POSITION_Y columns by the constant that converts microns back to pixels.

We'll make sure to add this rather important information of reseting scale as comment in our video (currently it was only on our wiki).

imagejan commented 2 years ago

@BBQuercus how about adding an option to deepblink to account for scaling in the spot coordinates for training? TrackMate always made a great effort to support real world (scaled) coordinates throughout the analysis, because it's important that track dynamics (like mean squared displacement MSD, etc.) are measured in a scientifically meaningful way (in particular on anisotropic image data, of course).

It would be a shame if people have to keep working around this by removing the scale before running TrackMate, and not be able to use their already tracked data in an easy way. And I think it wouldn't be too much of an effort to implement a scaling function into the training process. What do you think?

BBQuercus commented 2 years ago

@imagejan I guess we can add it to the dataset creation stage to keep the actual training process intact, especially between different datasets. I'll have to see how easy it is to extract the scale from the images metadata (any experience on that?) or just require the user to provide a fixed scaling constant / pixel size. At inference time it would be the same story.

tamasbalassa commented 2 years ago

Thanks for the explanation @BBQuercus. Actually in my dataset, I have over 60 channels (yesterday I sent you only 2 of those) and the spot-size on all of them will likely differ (however they remain still very small, pixel-wise). As much as I understand, scaling would be indeed a solution - just imagine having 60 types of spots with different sizes.

Regarding the solution for those 2 channels I sent you yesterday, to do a new labelling on them would not be a big deal, if only removing the scale would help the training. In that case, would it be necessary to do a workaround/modification also on the test images? (I also want to note, that the original images are quite big, over 6000x6000px, but I am playing around with cropped parts just to speed up the process and test whether the approach works well - maybe its a useful info.)

BBQuercus commented 2 years ago

The scaling was referring to the x/y positions and not the spot size. The problem arises since the pixel size isn't also 1 micron (but a lot smaller). Therefore, when visualising spots using pixel dimensions, everything will get clustered in the top left corner of the image instead of being spread out evenly. So scaling would multiply the x/y positions by a constant to make sure everything starts at 0 and ends where the image ends too.

We'll probably add an easier way to visualise a dataset (and prediction) soon to view what the model "sees".

BBQuercus commented 2 years ago

The change in #135 will now allow you to simply pass in the images without having to scale the csv output beforehand. We'll be adding functionality to output micron coordinates from deepblink predict at a later timepoint too. I'll close this issue for now but feel free to open it again if something isn't clear or doesn't work.

tamasbalassa commented 2 years ago

Thank you guys. I can confirm that the training seems to be working as now (with the same dataset) the loss is perfectly decreasing.

Regarding the prediction, do you have any plans? I would like to visualize and check the results somehow.

zhanyinx commented 2 years ago

Hey there,

Happy to hear that it's working!

Regarding your question, it is already included in our last PR https://github.com/BBQuercus/deepBlink/pull/134

where we developed the "visualize" deepblink module that you can use to visualise both the labelling and the prediction.

as example deepblink visualize --image yourimage --prediction deepblinkout.csv

tamasbalassa commented 2 years ago

I was able to try it out and I can tell that its working perfectly, amazing! Thank you @zhanyinx! Any idea maybe when will you implement the deepbilink predict function? I (and probably many) would have a great use of it.

zhanyinx commented 2 years ago

Hey!

What do you mean for deepBlink prediction?

We have deepblink predict that can be used to predict the spots given an image and a model (see https://github.com/BBQuercus/deepBlink/wiki/Usage). Moreover we have deepblink visualize that can take a deepblink prediction and the corresponding image to visualise the prediction on the image.

Can you elaborate more on what you mean for prediction function?

Best Zhan

tamasbalassa commented 2 years ago

Hey! Sorry for my late answer, I've just realized I still owe with it:

Actually my wording was incorrect and I meant a function that can generate previews of (/visualizes) the results during prediction time? Something like including the visualize function into the predict function. :)

At the other hand I came with another question. I am still working on FISH data. Created my own annotated dataset of few thousands of samples, yet the prediction misses sometimes few - but sometimes quite large number of signals (see the images below: the first one misses the signals on the top part of the signal-island; the second one misses many, very definite signals).

I did check the pixel values for the signals that are not recognized as positives, but their value is not an outlier. There were signals with lower and higher value that were recognized as positives. Do you have any ideas why is this happening? Can I tweak this somehow to recognize the signals with better accuracy?

zhanyinx commented 2 years ago

Hey!

Sorry for the late reply.

Regarding the first question, we think that introducing an extra option to visualize the performance during prediction will just create redundant functionality. It's cleaner to do that in two steps: predict + visualize

Regarding the missing spots, can you give us the corresponding images alongside with the model? Can you tell us also the command/script you used to make the prediction

Best Zhan

BBQuercus / deepBlink

How to fine-tune with my custom data? #133