ericcombiolab / LRTK

A unified and versatile toolkit for analyzing Linked-Read sequencing data
MIT License
4 stars 2 forks source link

lrtk MKFQ fails for stLFR with 'simulate_read_fails' #2

Closed btm685 closed 2 years ago

btm685 commented 2 years ago

this example fails immediately:

$LRTK MKFQ -CF "./example/FQs/simulation/diploid_config" -IT stLFR

was able to temporarily resolve by editing simulate_reads_stLFR.py and commenting out these lines:

if parameter_struc.Barcode_Length != int( parameter_struc.Barcode_qual[ parameter_struc.Barcodequal.rfind("")

not sure why you're trying to compare an integer to part of a pathname string

CicyYeung commented 2 years ago

Thanks for your comments. This error was induced by using inconsistent config files (-CF "./example/FQs/simulation/diploid_config")and sequencing technology (-IT stLFR). The default config file is used to generate linked-reads for 10x genomics only. For the simulation of stLFR sequencing data, we use another library resource and diploid config files to make FASTQs (./example/FQs/simulation_stLFR/diploid_config). Due to the inherent discrepancy between stLFR sequencing and 10x genomics, such as the barcode whitelist and error profile, we had to use two separate config files.

Regarding the question about comparing an integer to part of a pathname string, we are intending to guarantee consistent Barcode_Length parameter in the config files and length of quality values in the error profile file. As shown in the ./example/FQs/simulation_stLFR/diploid_config directory, The number “54” in the file name (pathname files) indicates the length of barcode. A detailed description about the barcode sequence for stLFR sequencing could be found on the website: https://github.com/stLFR/stLFR_read_demux.

The simulation_stLFR was not included previously. Currently, We add the config files into the example.tar.gz and update them in the Google Drive (https://drive.google.com/drive/folders/1XPW2avL_LZAt5yIh9tb35jZ5GfCSj7eQ). We also updated related documents on GitHub. For example, the updated commands to generate stLFR sequencing reads could be found in the "function 1" part in the "Commands for raw read and variant analysis" section.

btm685 commented 2 years ago

The example stLFR in the updated readme works fine now. Thank you for the quick response and the updated readme with many more examples.