JunyueCaoLab / PerturbSci_Kinetics

Reads processing scripts for PerturbSci-Kinetics.
MIT License
5 stars 3 forks source link

what is R3.fastq.gz file? #1

Closed diminghui closed 1 month ago

diminghui commented 9 months ago

I noticed the usage of an 'R3.fastq.gz' sequence file in the code, but I'm unsure about its origin. Specifically, I'm curious to know where the 'R3.fastq.gz' sequence file is sourced from, whether it represents gRNA sequencing data, and the reason behind having only one file in the code.

I'm seeking clarification if this is indicative of a single-end sequencing scenario or if there might be missing references to other files within the code. With the 'R3.fastq.gz' file involved in the code, I aim to confirm the specific meaning and purpose of this file to better understand the code logic and data processing methods. Your assistance and insights regarding this matter would be greatly appreciated.

z1hanxu commented 2 months ago

Due to the format limitations of the GEO system, we only uploaded the R1 and R2 files. Although we indeed used three files (R1, R2, R3 files correspond to Read1, index5, and Read2) for data preprocessing, to comply with the system requirements, we uploaded the fastq R1 and R2 files with index5 barcodes attached to the header of each read. You can either reconstruct the third fastq file by reformatting the sequence in the header or slightly modify the Python script from the first step to make these two fastq files compatible.