Open bnpapas opened 1 year ago
Hi @bnpapas - Are you segmenting the fast5?
I am following the instructions posted here: https://psy-fer.github.io/deeplexicon/train/ I'm not sure which step would be segmentation?
You may need to segment the data a priori, e.g. by running python3 deeplexicon.py dmux This will split the signal to separate the barcodes from the RNA. Then train on the segmented barcode output.
On Mar 13, 2023, at 10:15 AM, bnpapas @.***> wrote:
I am following the instructions posted here: https://psy-fer.github.io/deeplexicon/train/ https://psy-fer.github.io/deeplexicon/train/ I'm not sure which step would be segmentation?
— Reply to this email directly, view it on GitHub https://github.com/Psy-Fer/deeplexicon/issues/24#issuecomment-1466226058, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDCR37TRXRHAKBGBBFT273W34TZTANCNFSM6AAAAAAVWZHVLA. You are receiving this because you are subscribed to this thread.
The goal here is to be able to train a new model with an eye towards possibly adding new barcodes - I won't be able to use dmux first in a real use case. The truth table files I've assembled are based on mapping information, as was done in the publication. The match between these truth tables and the dmux results from "resnet20-final.h5" is very good.
Edit: To make sure it is clear, I am using the python version of the training code, which uses the "dRNA_segmenter" function to segment reads prior to image generation and subsequent training.
When dmux is assigning barcodes, it uses the "classify" function. This function does a transform of the data:
x = image.astype('float32') + 1
x = x / 2
The training subcommand, however, does not take this step and trains directly on the images. I've removed the transform from "classify" and now my freshly-trained models produce sensible results with dmux. I assume I can get similar behavior by adding the transform into the train subroutine. Is there a reason to think having this transformation is better than not?
I think that was added (meant to be on both), to avoid a zero divide error to make it 1 indexed. Sorry been a while since I wrote that.
You may need to segment the data a priori, e.g. by running python3 deeplexicon.py dmux This will split the signal to separate the barcodes from the RNA. Then train on the segmented barcode output. … On Mar 13, 2023, at 10:15 AM, bnpapas @.***> wrote: I am following the instructions posted here: https://psy-fer.github.io/deeplexicon/train/ https://psy-fer.github.io/deeplexicon/train/ I'm not sure which step would be segmentation? — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDCR37TRXRHAKBGBBFT273W34TZTANCNFSM6AAAAAAVWZHVLA. You are receiving this because you are subscribed to this thread.
would you mind sharing the code? I see deeplexicon_multi.py squig
for getting the segmetation but how to would you "split the signal to separate the barcodes from the RNA"?
I have been attempting to use the fast5 data provided with the manuscript to train a model to call the same 4 barcodes as "resnet20-final.h5". I've used mapping information to assign barcodes, and if I use the given model with deeplexicon the agreement with my truth table is excellent. I've tried taking 40k reads from each barcode as a training set, with 10k from each as test and validation sets. The training runs, seemingly without issue, however it shows some behavior I don't understand.
Note: I have been using the docker image provided by pulling lpryszcz/deeplexicon:1.2.0-gpu, with "deeplexicon_multi.py train" having default options. Do you have any suggestions how I can improve the model training results?