biocorecrg / MOP2

Master of Pores 2
https://biocorecrg.github.io/MOP2/docs/
MIT License
23 stars 7 forks source link

deeplexicon final table in nextflow output #41

Open lpryszcz opened 1 year ago

lpryszcz commented 1 year ago

Hi Luca, could we save deeplexicon table (.tsv) in the final output?

lpryszcz commented 1 year ago

Sonia just pointed that reads per barcode are reported in QC output. It would be nice to include probabilities and confidence interval in those files.

> head QC_files/fast5---bc_1_final_summary.stats
filename        read_id
batch0.fast5    17396d2c-2693-482a-8023-2d59eba90cbf
batch0.fast5    0d2e699f-c1a5-4841-a0fb-f5aa22008dba
batch0.fast5    094d1ef3-fdb7-43de-97c4-8c366e9dc238
batch0.fast5    2a2aca8e-77a2-4de1-bb04-3756587e234d
batch0.fast5    147db6be-2f9d-4416-b1e1-dde20837a222
batch0.fast5    0f001057-386b-4e76-876c-ac15c01f4176
lucacozzuto commented 1 year ago

Hi Leszek, where is this information that we need to publish to the final output?

lpryszcz commented 1 year ago

@soniacruciani

lpryszcz commented 1 year ago

deeplexicon output (tsv file) has several columns, we'd want to include Confidence Interval , but I think we could just copy entire file as is. example below.

fast5   ReadID  Barcode Confidence Interval     P_bc_1  P_bc_2  P_bc_3  P_bc_4
FAQ43205_ae8483a4_0.fast5       00084a6d-553e-4d09-936b-94d5f1b23007    bc_4    0.3033  0.02900 0.05911 0.30428 0.60762
FAQ43205_ae8483a4_0.fast5       0037e932-b453-458a-8ad7-6f6c399c2922    bc_3    0.9993  0.00001 0.00032 0.99966 0.00001
FAQ43205_ae8483a4_0.fast5       003d0fb7-6c5e-45b1-8a79-ab78e0a7a6dc    bc_3    0.9757  0.00528 0.00934 0.98500 0.00038
FAQ43205_ae8483a4_0.fast5       0041152a-40cc-487e-be83-aa5f9d33c96a    bc_3    0.9871  0.00079 0.00596 0.99310 0.00016
FAQ43205_ae8483a4_0.fast5       005975a1-65d6-4bef-85da-f90e048d30f8    bc_3    0.9288  0.02709 0.00667 0.95594 0.01030
FAQ43205_ae8483a4_0.fast5       005abdfe-4723-40b3-ad2c-c578d20c6c58    bc_3    0.9993  0.00000 0.00032 0.99965 0.00003