epi2me-labs / wf-single-cell

Other
75 stars 39 forks source link

wf-single-cell HTML report shows incorrect number of the Input reads #131

Closed jackchenx3 closed 3 months ago

jackchenx3 commented 3 months ago

Operating System

CentOS 7

Other Linux

No response

Workflow Version

2.1.0

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

No response

Workflow Execution - CLI Execution Profile

None

What happened?

Hi,

I'm using the latest version (2.1.0) to process my ONT single-cell data and noticed a discrepancy between the number of reads in the input FASTQ file and those reported in the HTML file. My FASTQ file contains 83,602,375 reads, but the HTML report shows 84,827,979 input reads. This issue also occurred in the previous version of the wf-single-cell pipelines, where the HTML report consistently shows 1-2% more input reads across all the samples I've processed.

I'm curious to know at which step in the process the input FASTQ file is counted and how the input read number is determined.

Thanks

Jack

Relevant log output

No logs

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

nrhorner commented 3 months ago

Hi @jackchenx3

The workflow identifies chimeric reads, those with 2 or more sub-reads fused into a single read, and splits them into sub-reads. The number of reads reported in the report summary is the total number of sub-reads. This should be made more obvious in the report or workflow docs.

Thanks,

Neil

jackchenx3 commented 3 months ago

I see, thank you very much for the information.

Jack