Psy-Fer / deeplexicon

Signal based nanopore RNA demultiplexing with convolutional neural networks
https://psy-fer.github.io/deeplexicon/
MIT License
36 stars 8 forks source link

dRNA_segmenter no seg found query #29

Closed BrendanBeahan closed 4 months ago

BrendanBeahan commented 4 months ago

Sorry if this is a rather elementary question, but I've just run deeplexicon using your latest Docker image. I confirmed I'm using CUDA v.10.0.130, and the hardware is NVIDIA GeForce GTX 1080 Ti. My code is as follows:

singularity exec --nv     /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/work/singularity_cache/deeplexicon_latest.sif     bash -c 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/targets/x86_64-linux/lib64:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH; \
    python3 /deeplexicon/deeplexicon_sub.py dmux -p /scratch/brussel/vo/000/bvo00030/vsc11010/Deeplexicon/RNA002/OHMX20220107/fast5_pass \
    -f multi -m /deeplexicon/models/resnet20-final.h5 > /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/Deeplexicon/7_16_24/output_RNA002.tsv'

However, when the code executes it appears that none of my reads are successfully segmenting the barcode. For example here is a snippet of the output:

info: dRNA_segmenter: no seg found: 6e72ccc9-dfd3-48c2-81be-df7a7e0012a8
info: dRNA_segmenter: no seg found: 6e8a52b9-9bfc-4763-8258-f629d3f70d7e
info: dRNA_segmenter: no seg found: 6e8f961f-f1ba-4743-ae1e-208d95ad360e
info: dRNA_segmenter: no seg found: 6ee7e41e-f511-42ad-8b4b-7ead70c48946
info: dRNA_segmenter: no seg found: 6fa050eb-21bd-4b32-ba9c-34d74ab65668
info: dRNA_segmenter: no seg found: 6fb45e9a-7f8c-49dd-93cb-4221435eb982
info: dRNA_segmenter: no seg found: 6febb364-4db1-40d4-94a8-80c3d2b07d9e
info: dRNA_segmenter: no seg found: 702df06c-7b2b-4db3-8220-30937fd17804 

Yet the output file that is being generated seems to still be populating with predictions. For example:

fast5   ReadID  Barcode Confidence Interval     P_bc_1  P_bc_2  P_bc_3  P_bc_4
FAX70525_pass_8f0502c4_67968bf0_0.fast5 001f0408-e3b5-4040-be41-3b96a0df2d73    bc_3    0.6501  0.00416 0.12880 0.77890 0.08815
FAX70525_pass_8f0502c4_67968bf0_0.fast5 002a336c-5960-43f0-b9da-af8a8526c257    bc_4    0.9521  0.02384 0.00006 0.00018 0.97593
FAX70525_pass_8f0502c4_67968bf0_0.fast5 002f10de-83d1-4e8b-a086-6cb88fb161c4    bc_2    0.8704  0.02149 0.92442 0.05397 0.00012
FAX70525_pass_8f0502c4_67968bf0_0.fast5 005a6d5d-73a5-4d56-9c83-32060a2439ab    bc_3    0.3799  0.18253 0.00079 0.59830 0.21838
FAX70525_pass_8f0502c4_67968bf0_0.fast5 00bf7934-a4dd-4b46-ae4b-a30030722249    bc_4    0.4549  0.16720 0.01122 0.18333 0.63825
FAX70525_pass_8f0502c4_67968bf0_0.fast5 011531b0-a589-4ec7-ab76-cc81d61ea926    bc_3    0.6015  0.00017 0.00249 0.79943 0.19792
FAX70525_pass_8f0502c4_67968bf0_0.fast5 013b1d94-de63-4b18-ae1f-42b8c0244b3e    bc_2    0.2126  0.37157 0.58415 0.03834 0.00594

I'm wondering if there is perhaps an issue with the way I've executed DeePlexiCon or if this is acceptable behavior? Transparently, I think there may be something fishy with the wet lab work that was done but I'd really like to rule out any downstream issue before I pursue that line of thought.

BrendanBeahan commented 4 months ago

Ah sorry, just realized that the dataset provided was indeed RNA004.. I assume explaining the higher failure incidence of segmentation.

Psy-Fer commented 4 months ago

Are the same reads that are not finding segmentation also getting predictions? They shouldn't.

And yea, this doesn't work for RNA004. Please reach out to Eva Maria Novoa for a new tool they developed for RNA004

James

BrendanBeahan commented 4 months ago

Hi James,

Yes, the reads failing segmentation are not being included, I misspoke when I said none of the reads were segmenting. I appreciate the heads up on the new RNA004 tool!

Thank you, Brendan