ericcombiolab / LRTK

A unified and versatile toolkit for analyzing Linked-Read sequencing data
MIT License
4 stars 2 forks source link

MKFQ barcode length? #6

Open pdimens opened 8 months ago

pdimens commented 8 months ago

Good afternoon,

In the config file for MKFQ, there is a final parameters Barcode_Length. In the example files provided by google drive, the barcode length is 54 for the 10x example. I have some confusion around this because 10x barcodes are 16bp. Can you please explain what Barcode_Length accomplishes?

CicyYeung commented 8 months ago

Sorry for the confusion. The barcode lengths are 16 bp, 30 bp, and 18 bp for 10x Genomics, stLFR, and TELL-Seq sequencing technologies, respectively. The final item "Barcode_Length" is a manually annotated label and it does not affect the outcome of the MKFQ function. We have removed this unnecessary label and updated the simulationDB.zip.

pdimens commented 8 months ago

Thank you for this clarification. In addition, may you please clarify what is meant by "coverage for long fragment" and "average number of molecules per droplet"?

CicyYeung commented 8 months ago

The term "Coverage for Long Fragment " refers to the average coverage of sequencing reads across a DNA fragment. To calculate the value, we initially infer the genomic coordinates for the start and end points of a fragment, and then we calculate the average coverage of reads spanning the fragment. Please check the article "Assessment of human diploid genome assembly with 10x Linked-Reads data" for further detailed investigation.

The term “average number of molecules per droplet” is often used for 10x Genomics linked-read sequencing. For this technology, 1 ng of high–molecular weight (HMW) genomic DNA is distributed across more than 100,000 droplets, resulting millions of DNA molecules during sequencing. Consequently, one droplet could contain from a few to several hundred DNA molecules. The average number of molecules per droplet may be used to assess the quality of linked-read sequencing libraries. For detailed examples and comprehensive statistics, you may check the article "Haplotyping germline and cancer genomes with high-throughput linked-read sequencing".

pdimens commented 8 months ago

I understand, thank you