gaolabtools / scNanoGPS

Single cell Nanopore sequencing data for Genotype and Phenotype
Other
39 stars 2 forks source link

Scanner log explanation #20

Closed chuckzzzz closed 7 months ago

chuckzzzz commented 8 months ago

Hi thanks for developing this tool!

Could you explain a bit more what the log file is saying? Here is an example of my logfile.

Specifically, what does the detection rate mean? Is it only 13.28% reads have valid adaptors and the reads that don't are discarded?

Thanks!

Total 58584185 reads are processed.
Time elapse: 4 : 20 : 9.48
Detecting rate: 13.28%

Result counting:
    Number of 3'-adaptor located on the read head region:               1983806
    Number of 3'-adaptor + polyT on the read head region:               113058
    Number of 3'-adaptor located on the read tail region:               5793295
    Number of 3'-adaptor + polyT on the read tail region:               4831651

Alignment counting:
    Number of 3'-adaptor having no mismatch:                                3003636

    Number of 3'-adaptor having mismatch at the last one position:          3290504
    Number of 3'-adaptor having mismatch at all the last two position:      3080713
    Number of 3'-adaptor having mismatch at all the last three position:    713218

    Number of 3'-adaptor having in/del at the last one position:            5294
    Number of 3'-adaptor having in/del at the last two position:            4372
    Number of 3'-adaptor having in/del at the last three position:          3647

    Number of rescued truncated 3'-adaptor on the read head region:     796119
    Number of rescued truncated 3'-adaptor on the read tail region:     1550310

Finish time stamp: Fri, 23 Feb 2024 21:23:30
LilyLuyang commented 8 months ago

Hi,

I had this same situation. I downloaded SRR21492154 data and the result is here: Total 98363542 reads are processed. Time elapse: 22 : 22 : 30.80 Detecting rate: 12.89%

Result counting: Number of 3'-adaptor located on the read head region: 2831310 Number of 3'-adaptor + polyT on the read head region: 439768 Number of 3'-adaptor located on the read tail region: 9843346 Number of 3'-adaptor + polyT on the read tail region: 1827523

Alignment counting: Number of 3'-adaptor having no mismatch: 938

    Number of 3'-adaptor having mismatch at the last one position:          592898
    Number of 3'-adaptor having mismatch at all the last two position:      201109
    Number of 3'-adaptor having mismatch at all the last three position:    96201

    Number of 3'-adaptor having in/del at the last one position:            22676
    Number of 3'-adaptor having in/del at the last two position:            5291525
    Number of 3'-adaptor having in/del at the last three position:          6741530

    Number of rescued truncated 3'-adaptor on the read head region:         6423
    Number of rescued truncated 3'-adaptor on the read tail region:         309789

Finish time stamp: Sat, 16 Mar 2024 09:45:26

I guess it is the first and last 100 nucleotides of reads to recognize TruSeq Read 1 and PolyA. maybe I don't understand very well. Very thankful for the further explanations!

thanks a lot.

shiauck commented 7 months ago

Hi,

Thank you for reporting the issues. However, I couldn't reproduce your low detection rate results. Could you provide me more details about your environments ?

The virtual environment I'm using:

(test_env) production ~/data/test/test_test $ python3 --version
Python 3.9.19

(test_env) ~/data/test/test_test $ pip3 install -r scNanoGPS/requirements.txt
Requirement already satisfied: biopython in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 1)) (1.83)
Requirement already satisfied: distance in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 2)) (0.1.3)
Requirement already satisfied: liqa in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 3)) (1.3.4)
Requirement already satisfied: matplotlib in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 4)) (3.8.3)
Requirement already satisfied: pandas in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 5)) (2.2.1)
Requirement already satisfied: pysam in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 6)) (0.22.0)
Requirement already satisfied: seaborn in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 7)) (0.13.2)
Requirement already satisfied: numpy in /anaconda3/envs/test_env/lib/python3.9/site-packages (from biopython->-r scNanoGPS/requirements.txt (line 1)) (1.26.4)
Requirement already satisfied: lifelines in /anaconda3/envs/test_env/lib/python3.9/site-packages (from liqa->-r scNanoGPS/requirements.txt (line 3)) (0.28.0)
Requirement already satisfied: contourpy>=1.0.1 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (4.50.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (24.0)
Requirement already satisfied: pillow>=8 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (10.2.0)
Requirement already satisfied: pyparsing>=2.3.1 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (2.9.0.post0)
Requirement already satisfied: importlib-resources>=3.2.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (6.4.0)
Requirement already satisfied: pytz>=2020.1 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from pandas->-r scNanoGPS/requirements.txt (line 5)) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from pandas->-r scNanoGPS/requirements.txt (line 5)) (2024.1)
Requirement already satisfied: zipp>=3.1.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from importlib-resources>=3.2.0->matplotlib->-r scNanoGPS/requirements.txt (line 4)) (3.18.1)
Requirement already satisfied: six>=1.5 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib->-r scNanoGPS/requirements.txt (line 4)) (1.16.0)
Requirement already satisfied: scipy>=1.2.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.12.0)
Requirement already satisfied: autograd>=1.5 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.6.2)
Requirement already satisfied: autograd-gamma>=0.3 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (0.5.0)
Requirement already satisfied: formulaic>=0.2.2 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.0.1)
Requirement already satisfied: future>=0.15.2 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from autograd>=1.5->lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.0.0)
Requirement already satisfied: interface-meta>=1.2.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from formulaic>=0.2.2->lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.3.0)
Requirement already satisfied: typing-extensions>=4.2.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from formulaic>=0.2.2->lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (4.10.0)
Requirement already satisfied: wrapt>=1.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from formulaic>=0.2.2->lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.16.0)

The scanner result I tried with SRR21492154

(test_env) ~/data/test/test_test $ cat scNanoGPS_res.3.9/scanner.log.txt
Starting time stamp: Fri, 22 Mar 2024 18:32:44

List of parameters:
        Current working directory:     ~/data/test/test_test
        Input file name:               source/SRR21492154.fastq.gz
        Output FastQ file name:        scNanoGPS_res.3.9/processed.fastq.gz
        Output barcode list name:      scNanoGPS_res.3.9/barcode_list.tsv.gz
        Log file name:                 scNanoGPS_res.3.9/scanner.log.txt

Parameters for pattern search:
        Length of barcode:             16
        Length of UMI:                 12
        5'-adaptor sequence:           AAGCAGTGGTATCAACGCAGAGTACAT
        3'-adaptor sequence:           CTACACGACGCTCTTCCGATCT
        PolyT sequence:                TTTTTTTTTTTT
        Scanning region length:        100

Penalty for dynamic programming:
        Matching:                      2
        Mismatching:                   -3
        Gap opening:                   -5
        Gap extension:                 -2
        Editing distance:              2

Parameters for computing:
        Number of computer cores:      10
        Number of reads per batch job: 1000
        Minimal length of read:        200
        Matching threshold:            0.7
        Scoring threshold:             0.4

Debug mode switch:             False

Total 98363542 reads are processed.
Time elapse: 19 : 2 : 43.55
Detecting rate: 78.47%

Result counting:
        Number of 3'-adaptor located on the read head region:                   37205573
        Number of 3'-adaptor + polyT on the read head region:                   36773661
        Number of 3'-adaptor located on the read tail region:                   39983822
        Number of 3'-adaptor + polyT on the read tail region:                   39217213

Alignment counting:
        Number of 3'-adaptor having no mismatch:                                31524450

        Number of 3'-adaptor having mismatch at the last one position:          4132090
        Number of 3'-adaptor having mismatch at all the last two position:      2398869
        Number of 3'-adaptor having mismatch at all the last three position:    1009858

        Number of 3'-adaptor having in/del at the last one position:            1144
        Number of 3'-adaptor having in/del at the last two position:            1049
        Number of 3'-adaptor having in/del at the last three position:          955

        Number of rescued truncated 3'-adaptor on the read head region:         120924
        Number of rescued truncated 3'-adaptor on the read tail region:         10598727

Finish time stamp: Sat, 23 Mar 2024 13:35:28

I also tried to construct python 3.11 environment, and ran scanner with SRR21492154. The scanner log result:

(test_env_3.11) ~/data/test/test_test $ cat scNanoGPS_res.3.11/scanner.log.txt
Starting time stamp: Fri, 22 Mar 2024 18:33:29

List of parameters:
        Current working directory:     ~/data/test/test_test
        Input file name:               source/SRR21492154.fastq.gz
        Output FastQ file name:        scNanoGPS_res.3.11/processed.fastq.gz
        Output barcode list name:      scNanoGPS_res.3.11/barcode_list.tsv.gz
        Log file name:                 scNanoGPS_res.3.11/scanner.log.txt

Parameters for pattern search:
        Length of barcode:             16
        Length of UMI:                 12
        5'-adaptor sequence:           AAGCAGTGGTATCAACGCAGAGTACAT
        3'-adaptor sequence:           CTACACGACGCTCTTCCGATCT
        PolyT sequence:                TTTTTTTTTTTT
        Scanning region length:        100

Penalty for dynamic programming:
        Matching:                      2
        Mismatching:                   -3
        Gap opening:                   -5
        Gap extension:                 -2
        Editing distance:              2

Parameters for computing:
        Number of computer cores:      10
        Number of reads per batch job: 1000
        Minimal length of read:        200
        Matching threshold:            0.7
        Scoring threshold:             0.4

Debug mode switch:             False

Total 98363542 reads are processed.
Time elapse: 18 : 17 : 58.16
Detecting rate: 78.47%

Result counting:
        Number of 3'-adaptor located on the read head region:                   37205573
        Number of 3'-adaptor + polyT on the read head region:                   36773661
        Number of 3'-adaptor located on the read tail region:                   39983822
        Number of 3'-adaptor + polyT on the read tail region:                   39217213

Alignment counting:
        Number of 3'-adaptor having no mismatch:                                31524450

        Number of 3'-adaptor having mismatch at the last one position:          4132090
        Number of 3'-adaptor having mismatch at all the last two position:      2398869
        Number of 3'-adaptor having mismatch at all the last three position:    1009858

        Number of 3'-adaptor having in/del at the last one position:            1144
        Number of 3'-adaptor having in/del at the last two position:            1049
        Number of 3'-adaptor having in/del at the last three position:          955

        Number of rescued truncated 3'-adaptor on the read head region:         120924
        Number of rescued truncated 3'-adaptor on the read tail region:         10598727

Finish time stamp: Sat, 23 Mar 2024 12:51:28

Please share me more details to identify how the issue occurred. Thank you.

Regards, Cheng-Kai

shiauck commented 7 months ago

The answers for your questions:

"Specifically, what does the detection rate mean? Is it only 13.28% reads have valid adaptors and the reads that don't are discarded?"

Detection rate means the proportion of reads which have identified to have TruSeq1 and TSO as mentioned in our paper.

"I guess it is the first and last 100 nucleotides of reads to recognize TruSeq Read 1 and PolyA. maybe I don't understand very well."

The scanner scans the first 100 nucleotide and the last 100 nucleotide for TruSeq R1/PolyA and TSO. In the scanner, I used the terms 3'-adaptor for TruSeq R1 and 5'-adaptor for TSO simply because the TruSeq R1 is ligated on the 3' tail of mRNA while TSO is ligated on 5' tail. It's rare case that the sequencer produce unknown sequence outside the adaptor pairs. And you can change the scanning region by using the parameter "--scanning_region"

Hope this helps.

Regards, Cheng-Kai

LilyLuyang commented 7 months ago

Hi,

Thank you for reporting the issues. However, I couldn't reproduce your low detection rate results. Could you provide me more details about your environments ?

The virtual environment I'm using:

(test_env) production ~/data/test/test_test $ python3 --version
Python 3.9.19

(test_env) ~/data/test/test_test $ pip3 install -r scNanoGPS/requirements.txt
Requirement already satisfied: biopython in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 1)) (1.83)
Requirement already satisfied: distance in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 2)) (0.1.3)
Requirement already satisfied: liqa in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 3)) (1.3.4)
Requirement already satisfied: matplotlib in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 4)) (3.8.3)
Requirement already satisfied: pandas in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 5)) (2.2.1)
Requirement already satisfied: pysam in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 6)) (0.22.0)
Requirement already satisfied: seaborn in /anaconda3/envs/test_env/lib/python3.9/site-packages (from -r scNanoGPS/requirements.txt (line 7)) (0.13.2)
Requirement already satisfied: numpy in /anaconda3/envs/test_env/lib/python3.9/site-packages (from biopython->-r scNanoGPS/requirements.txt (line 1)) (1.26.4)
Requirement already satisfied: lifelines in /anaconda3/envs/test_env/lib/python3.9/site-packages (from liqa->-r scNanoGPS/requirements.txt (line 3)) (0.28.0)
Requirement already satisfied: contourpy>=1.0.1 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (4.50.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (24.0)
Requirement already satisfied: pillow>=8 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (10.2.0)
Requirement already satisfied: pyparsing>=2.3.1 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (2.9.0.post0)
Requirement already satisfied: importlib-resources>=3.2.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from matplotlib->-r scNanoGPS/requirements.txt (line 4)) (6.4.0)
Requirement already satisfied: pytz>=2020.1 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from pandas->-r scNanoGPS/requirements.txt (line 5)) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from pandas->-r scNanoGPS/requirements.txt (line 5)) (2024.1)
Requirement already satisfied: zipp>=3.1.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from importlib-resources>=3.2.0->matplotlib->-r scNanoGPS/requirements.txt (line 4)) (3.18.1)
Requirement already satisfied: six>=1.5 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib->-r scNanoGPS/requirements.txt (line 4)) (1.16.0)
Requirement already satisfied: scipy>=1.2.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.12.0)
Requirement already satisfied: autograd>=1.5 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.6.2)
Requirement already satisfied: autograd-gamma>=0.3 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (0.5.0)
Requirement already satisfied: formulaic>=0.2.2 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.0.1)
Requirement already satisfied: future>=0.15.2 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from autograd>=1.5->lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.0.0)
Requirement already satisfied: interface-meta>=1.2.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from formulaic>=0.2.2->lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.3.0)
Requirement already satisfied: typing-extensions>=4.2.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from formulaic>=0.2.2->lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (4.10.0)
Requirement already satisfied: wrapt>=1.0 in /anaconda3/envs/test_env/lib/python3.9/site-packages (from formulaic>=0.2.2->lifelines->liqa->-r scNanoGPS/requirements.txt (line 3)) (1.16.0)

The scanner result I tried with SRR21492154

(test_env) ~/data/test/test_test $ cat scNanoGPS_res.3.9/scanner.log.txt
Starting time stamp: Fri, 22 Mar 2024 18:32:44

List of parameters:
        Current working directory:     ~/data/test/test_test
        Input file name:               source/SRR21492154.fastq.gz
        Output FastQ file name:        scNanoGPS_res.3.9/processed.fastq.gz
        Output barcode list name:      scNanoGPS_res.3.9/barcode_list.tsv.gz
        Log file name:                 scNanoGPS_res.3.9/scanner.log.txt

Parameters for pattern search:
        Length of barcode:             16
        Length of UMI:                 12
        5'-adaptor sequence:           AAGCAGTGGTATCAACGCAGAGTACAT
        3'-adaptor sequence:           CTACACGACGCTCTTCCGATCT
        PolyT sequence:                TTTTTTTTTTTT
        Scanning region length:        100

Penalty for dynamic programming:
        Matching:                      2
        Mismatching:                   -3
        Gap opening:                   -5
        Gap extension:                 -2
        Editing distance:              2

Parameters for computing:
        Number of computer cores:      10
        Number of reads per batch job: 1000
        Minimal length of read:        200
        Matching threshold:            0.7
        Scoring threshold:             0.4

Debug mode switch:             False

Total 98363542 reads are processed.
Time elapse: 19 : 2 : 43.55
Detecting rate: 78.47%

Result counting:
        Number of 3'-adaptor located on the read head region:                   37205573
        Number of 3'-adaptor + polyT on the read head region:                   36773661
        Number of 3'-adaptor located on the read tail region:                   39983822
        Number of 3'-adaptor + polyT on the read tail region:                   39217213

Alignment counting:
        Number of 3'-adaptor having no mismatch:                                31524450

        Number of 3'-adaptor having mismatch at the last one position:          4132090
        Number of 3'-adaptor having mismatch at all the last two position:      2398869
        Number of 3'-adaptor having mismatch at all the last three position:    1009858

        Number of 3'-adaptor having in/del at the last one position:            1144
        Number of 3'-adaptor having in/del at the last two position:            1049
        Number of 3'-adaptor having in/del at the last three position:          955

        Number of rescued truncated 3'-adaptor on the read head region:         120924
        Number of rescued truncated 3'-adaptor on the read tail region:         10598727

Finish time stamp: Sat, 23 Mar 2024 13:35:28

I also tried to construct python 3.11 environment, and ran scanner with SRR21492154. The scanner log result:

(test_env_3.11) ~/data/test/test_test $ cat scNanoGPS_res.3.11/scanner.log.txt
Starting time stamp: Fri, 22 Mar 2024 18:33:29

List of parameters:
        Current working directory:     ~/data/test/test_test
        Input file name:               source/SRR21492154.fastq.gz
        Output FastQ file name:        scNanoGPS_res.3.11/processed.fastq.gz
        Output barcode list name:      scNanoGPS_res.3.11/barcode_list.tsv.gz
        Log file name:                 scNanoGPS_res.3.11/scanner.log.txt

Parameters for pattern search:
        Length of barcode:             16
        Length of UMI:                 12
        5'-adaptor sequence:           AAGCAGTGGTATCAACGCAGAGTACAT
        3'-adaptor sequence:           CTACACGACGCTCTTCCGATCT
        PolyT sequence:                TTTTTTTTTTTT
        Scanning region length:        100

Penalty for dynamic programming:
        Matching:                      2
        Mismatching:                   -3
        Gap opening:                   -5
        Gap extension:                 -2
        Editing distance:              2

Parameters for computing:
        Number of computer cores:      10
        Number of reads per batch job: 1000
        Minimal length of read:        200
        Matching threshold:            0.7
        Scoring threshold:             0.4

Debug mode switch:             False

Total 98363542 reads are processed.
Time elapse: 18 : 17 : 58.16
Detecting rate: 78.47%

Result counting:
        Number of 3'-adaptor located on the read head region:                   37205573
        Number of 3'-adaptor + polyT on the read head region:                   36773661
        Number of 3'-adaptor located on the read tail region:                   39983822
        Number of 3'-adaptor + polyT on the read tail region:                   39217213

Alignment counting:
        Number of 3'-adaptor having no mismatch:                                31524450

        Number of 3'-adaptor having mismatch at the last one position:          4132090
        Number of 3'-adaptor having mismatch at all the last two position:      2398869
        Number of 3'-adaptor having mismatch at all the last three position:    1009858

        Number of 3'-adaptor having in/del at the last one position:            1144
        Number of 3'-adaptor having in/del at the last two position:            1049
        Number of 3'-adaptor having in/del at the last three position:          955

        Number of rescued truncated 3'-adaptor on the read head region:         120924
        Number of rescued truncated 3'-adaptor on the read tail region:         10598727

Finish time stamp: Sat, 23 Mar 2024 12:51:28

Please share me more details to identify how the issue occurred. Thank you.

Regards, Cheng-Kai

Hi Cheng-Kai,

Thanks very much for your helpful reply!

My python environment is Python 3.7.12, and I just install required libraries and tools within this virtual environment. The scanner.py are run subsequently, generating above low detection rate of adaptors at the both ends of sequences. When I ran example fastq data , while the similar detecion rate occurred again:

Total 7731 reads are processed. Time elapse: 0 : 0 : 8.77 Detecting rate: 11.77%

Here I just used the example code. May I ask is it because the python environment or libraries/tools installed wrongly? The output '*minimap2.bam' are empty if I use this low detection rate 'processed.fastq.gz', which confused me a lot.

I would appreciate it if you could give me some advice to deal with it. Thanks a lot.

Regards, Lily

shiauck commented 7 months ago

Hi Lily,

I developed the pipeline based on Python 3.9, and I never try older python version. To my knowledge, I believe python 3.7 might use older version biopython and pysam as well which I cannot guarantee that my code could work properly. Please create a new conda environment with:

conda create -n <new_env_name> python=3.9 numpy scipy

Please update anaconda/miniconda if necessary.

The 11.77% detecting rate is too low, which means that only 11.77% of reads are detected to have TruSeqR1+CellBarcode+UMI. This result is way too low and doesn't make sense.

According to your trial of SRR21492154, I believe there's something wrong about conda environment. Please setting up new environment and making sure you can obtain above 70% detection rate from SRR21492154. Then please run your data again. Hope this helps.

Regards, Cheng-Kai

Hi Cheng-Kai,

Thanks very much for your helpful reply!

My python environment is Python 3.7.12, and I just install required libraries and tools within this virtual environment. The scanner.py are run subsequently, generating above low detection rate of adaptors at the both ends of sequences. When I ran example fastq data , while the similar detecion rate occurred again:

Total 7731 reads are processed. Time elapse: 0 : 0 : 8.77 Detecting rate: 11.77%

Here I just used the example code. May I ask is it because the python environment or libraries/tools installed wrongly?

According to your trial of SRR21492154, I believe there's something wrong about conda environment.

The output '*minimap2.bam' are empty if I use this low detection rate 'processed.fastq.gz', which confused me a lot.

The processed.fastq.gz stores the reads after removal of TruSeq R1, cell barcode, UMI, TSO. Empty *.minimap2.bam files are resulting from library/tool/environment because the 2nd assigner is designed to filter out ambients. So there should be no way to generate empty bam files.

I would appreciate it if you could give me some advice to deal with it. Thanks a lot.

Regards, Lily

LilyLuyang commented 7 months ago

Hi Lily,

I developed the pipeline based on Python 3.9, and I never try older python version. To my knowledge, I believe python 3.7 might use older version biopython and pysam as well which I cannot guarantee that my code could work properly. Please create a new conda environment with:

conda create -n <new_env_name> python=3.9 numpy scipy

Please update anaconda/miniconda if necessary.

The 11.77% detecting rate is too low, which means that only 11.77% of reads are detected to have TruSeqR1+CellBarcode+UMI. This result is way too low and doesn't make sense.

According to your trial of SRR21492154, I believe there's something wrong about conda environment. Please setting up new environment and making sure you can obtain above 70% detection rate from SRR21492154. Then please run your data again. Hope this helps.

Regards, Cheng-Kai

Hi Cheng-Kai, Thanks very much for your helpful reply! My python environment is Python 3.7.12, and I just install required libraries and tools within this virtual environment. The scanner.py are run subsequently, generating above low detection rate of adaptors at the both ends of sequences. When I ran example fastq data , while the similar detecion rate occurred again: Total 7731 reads are processed. Time elapse: 0 : 0 : 8.77 Detecting rate: 11.77% Here I just used the example code. May I ask is it because the python environment or libraries/tools installed wrongly?

According to your trial of SRR21492154, I believe there's something wrong about conda environment.

The output '*minimap2.bam' are empty if I use this low detection rate 'processed.fastq.gz', which confused me a lot.

The processed.fastq.gz stores the reads after removal of TruSeq R1, cell barcode, UMI, TSO. Empty *.minimap2.bam files are resulting from library/tool/environment because the 2nd assigner is designed to filter out ambients. So there should be no way to generate empty bam files.

I would appreciate it if you could give me some advice to deal with it. Thanks a lot. Regards, Lily

Hi,

Many thanks for your quick reply!

Here I recreate a new conda environment and the python version as follows: $ python3 --version Python 3.9.19

Dependencies and required libraries/tools have already been installed within Python 3.9.19 version.

The code that I used to process example data is: 'python3 scanner.py -i example/fastq/ -t 2'

The 'scanner.log.txt' file has 100% detection rate now:

    Total 7731 reads are processed.
    Time elapse: 0 : 0 : 15.08
    Detecting rate: 100.00%
    Result counting:
    Number of 3'-adaptor located on the read head region:                   3854
    Number of 3'-adaptor + polyT on the read head region:                   3854
    Number of 3'-adaptor located on the read tail region:                   3877
    Number of 3'-adaptor + polyT on the read tail region:                   3877

    Alignment counting:
    Number of 3'-adaptor having no mismatch:                                4677

    Number of 3'-adaptor having mismatch at the last one position:          35
    Number of 3'-adaptor having mismatch at all the last two position:      24
    Number of 3'-adaptor having mismatch at all the last three position:    3

    Number of 3'-adaptor having in/del at the last one position:            0
    Number of 3'-adaptor having in/del at the last two position:            0
    Number of 3'-adaptor having in/del at the last three position:          0

    Number of rescued truncated 3'-adaptor on the read head region:         3
    Number of rescued truncated 3'-adaptor on the read tail region:         879

I think the reason is that the low python version was used for the scanner step, resulting in a very low detecting rate of R1 and TSO adaptors. I will use SRR21492154 data for further analysis, and give you some feedback if it can be run smoothly.

Thank you very much.

Regards, Lily

shiauck commented 7 months ago

Hi,

I just ran the test sample and the detection rate is 100%.

Starting time stamp: Mon, 25 Mar 2024 10:10:36

List of parameters:
    Current working directory:     ~/data/test/test_test
    Input file name:               scNanoGPS/example/fastq/example.fastq.gz
    Output FastQ file name:        test/processed.fastq.gz
    Output barcode list name:      test/barcode_list.tsv.gz
    Log file name:                 test/scanner.log.txt

Parameters for pattern search:
    Length of barcode:             16
    Length of UMI:                 12
    5'-adaptor sequence:           AAGCAGTGGTATCAACGCAGAGTACAT
    3'-adaptor sequence:           CTACACGACGCTCTTCCGATCT
    PolyT sequence:                TTTTTTTTTTTT
    Scanning region length:        100

Penalty for dynamic programming:
    Matching:                      2
    Mismatching:                   -3
    Gap opening:                   -5
    Gap extension:                 -2
    Editing distance:              2

Parameters for computing:
    Number of computer cores:      10
    Number of reads per batch job: 1000
    Minimal length of read:        200
    Matching threshold:            0.7
    Scoring threshold:             0.4

Debug mode switch:             False

Total 7731 reads are processed.
Time elapse: 0 : 0 : 5.73
Detecting rate: 100.00%

Result counting:
    Number of 3'-adaptor located on the read head region:               3854
    Number of 3'-adaptor + polyT on the read head region:               3854
    Number of 3'-adaptor located on the read tail region:               3877
    Number of 3'-adaptor + polyT on the read tail region:               3877

Alignment counting:
    Number of 3'-adaptor having no mismatch:                                4677

    Number of 3'-adaptor having mismatch at the last one position:          35
    Number of 3'-adaptor having mismatch at all the last two position:      24
    Number of 3'-adaptor having mismatch at all the last three position:    3

    Number of 3'-adaptor having in/del at the last one position:            0
    Number of 3'-adaptor having in/del at the last two position:            0
    Number of 3'-adaptor having in/del at the last three position:          0

    Number of rescued truncated 3'-adaptor on the read head region:     3
    Number of rescued truncated 3'-adaptor on the read tail region:     879

Finish time stamp: Mon, 25 Mar 2024 10:10:42

Could you show me the libraries version by using "pip3 install -r scNanoGPS/requirements.txt" ? This command will install / show the version of required packages.

In addition, please try using debug mode with the following command

python3 scNanoGPS/scanner.py -i scNanoGPS/example/fastq/example.fastq.gz --debug_mode 1

and copy me the first 50 / 100 lines of the result to inspect the scanner result and identify the problem. Thank you.

Regards, Cheng-Kai

LilyLuyang commented 7 months ago

Hi,

A thousand of thanks for your response.

I know the reason is the old python version and corresponding libraries installed within this environment. Here is my environment output when installing required libraries:

$ pip3 install -r requirements.txt Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting biopython (from -r requirements.txt (line 1)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/30/b0/73fc250af13256c1c1db1edd17f2786fb02dda4c141d809b0d4159c6bbf1/biopython-1.83-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 1.7 MB/s eta 0:00:00 Collecting distance (from -r requirements.txt (line 2)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/5c/1a/883e47df323437aefa0d0a92ccfb38895d9416bd0b56262c2e46a47767b8/Distance-0.1.3.tar.gz (180 kB) Preparing metadata (setup.py) ... done Collecting liqa (from -r requirements.txt (line 3)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/3b/e8/b0c456108472fa256afeafe93084132fc229ffb5923151d8577eb7ad2dad/liqa-1.3.4-py3-none-any.whl (32 kB) Collecting matplotlib (from -r requirements.txt (line 4)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/35/82/ca05c3e3ec4a38eaf49a9bfa1a700658284ddaaa2e2523fa91fbb96d207a/matplotlib-3.8.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.6/11.6 MB 11.8 MB/s eta 0:00:00 Collecting pandas (from -r requirements.txt (line 5)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1a/5e/71bb0eef0dc543f7516d9ddeca9ee8dc98207043784e3f7e6c08b4a6b3d9/pandas-2.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.0/13.0 MB 12.3 MB/s eta 0:00:00 Collecting pysam (from -r requirements.txt (line 6)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/35/22/3d01778c13f1103401313f1232c1c0596d97aaee21c1d60564640f3049bd/pysam-0.22.0.tar.gz (4.6 MB) Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... done Preparing metadata (pyproject.toml) ... done Collecting seaborn (from -r requirements.txt (line 7)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/83/11/00d3c3dfc25ad54e731d91449895a79e4bf2384dc3ac01809010ba88f6d5/seaborn-0.13.2-py3-none-any.whl (294 kB) Requirement already satisfied: numpy in /nfshome/store02/users/c.c23047690/.conda/envs/scNanoGPS/lib/python3.9/site-packages (from biopython->-r requirements.txt (line 1)) (1.26.4) Collecting lifelines (from liqa->-r requirements.txt (line 3)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/b3/98/868d6b60a6a8847a53bca3b15b0e057fb3ed6395e5852f0c0c55bbaaa928/lifelines-0.28.0-py3-none-any.whl (349 kB) Collecting contourpy>=1.0.1 (from matplotlib->-r requirements.txt (line 4)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a9/ba/d8fd1380876f1e9114157606302e3644c85f6d116aeba354c212ee13edc7/contourpy-1.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (310 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 311.0/311.0 kB 18.7 MB/s eta 0:00:00 Collecting cycler>=0.10 (from matplotlib->-r requirements.txt (line 4)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/e7/05/c19819d5e3d95294a6f5947fb9b9629efb316b96de511b418c53d245aae6/cycler-0.12.1-py3-none-any.whl (8.3 kB) Collecting fonttools>=4.22.0 (from matplotlib->-r requirements.txt (line 4)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/99/61/720e74663d9b0d54f60230cce977f11650481ae3c703d938ac80c5536828/fonttools-4.50.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.6/4.6 MB 14.7 MB/s eta 0:00:00 Collecting kiwisolver>=1.3.1 (from matplotlib->-r requirements.txt (line 4)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c0/a8/841594f11d0b88d8aeb26991bc4dac38baa909dc58d0c4262a4f7893bcbf/kiwisolver-1.4.5-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 28.5 MB/s eta 0:00:00 Collecting packaging>=20.0 (from matplotlib->-r requirements.txt (line 4)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/49/df/1fceb2f8900f8639e278b056416d49134fb8d84c5942ffaa01ad34782422/packaging-24.0-py3-none-any.whl (53 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.5/53.5 kB 1.4 MB/s eta 0:00:00 Collecting pillow>=8 (from matplotlib->-r requirements.txt (line 4)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/fd/98/35887712a640fe016817988141db021e1398b6d6620d29f8dceaffe72656/pillow-10.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 14.1 MB/s eta 0:00:00 Collecting pyparsing>=2.3.1 (from matplotlib->-r requirements.txt (line 4)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9d/ea/6d76df31432a0e6fdf81681a895f009a4bb47b3c39036db3e1b528191d52/pyparsing-3.1.2-py3-none-any.whl (103 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.2/103.2 kB 7.1 MB/s eta 0:00:00 Collecting python-dateutil>=2.7 (from matplotlib->-r requirements.txt (line 4)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB) Collecting importlib-resources>=3.2.0 (from matplotlib->-r requirements.txt (line 4)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/75/06/4df55e1b7b112d183f65db9503bff189e97179b256e1ea450a3c365241e0/importlib_resources-6.4.0-py3-none-any.whl (38 kB) Collecting pytz>=2020.1 (from pandas->-r requirements.txt (line 5)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/9c/3d/a121f284241f08268b21359bd425f7d4825cffc5ac5cd0e1b3d82ffd2b10/pytz-2024.1-py2.py3-none-any.whl (505 kB) Collecting tzdata>=2022.7 (from pandas->-r requirements.txt (line 5)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/65/58/f9c9e6be752e9fcb8b6a0ee9fb87e6e7a1f6bcab2cdc73f02bb7ba91ada0/tzdata-2024.1-py2.py3-none-any.whl (345 kB) Collecting zipp>=3.1.0 (from importlib-resources>=3.2.0->matplotlib->-r requirements.txt (line 4)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c2/0a/ba9d0ee9536d3ef73a3448e931776e658b36f128d344e175bc32b092a8bf/zipp-3.18.1-py3-none-any.whl (8.2 kB) Collecting six>=1.5 (from python-dateutil>=2.7->matplotlib->-r requirements.txt (line 4)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl (11 kB) Requirement already satisfied: scipy>=1.2.0 in /nfshome/store02/users/c.c23047690/.conda/envs/scNanoGPS/lib/python3.9/site-packages (from lifelines->liqa->-r requirements.txt (line 3)) (1.12.0) Collecting autograd>=1.5 (from lifelines->liqa->-r requirements.txt (line 3)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/81/70/d5c7c2a458b8be96495c8b1634c2155beab58cbe864b7a9a5c06c2e52520/autograd-1.6.2-py3-none-any.whl (49 kB) Collecting autograd-gamma>=0.3 (from lifelines->liqa->-r requirements.txt (line 3)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/85/ae/7f2031ea76140444b2453fa139041e5afd4a09fc5300cfefeb1103291f80/autograd-gamma-0.5.0.tar.gz (4.0 kB) Preparing metadata (setup.py) ... done Collecting formulaic>=0.2.2 (from lifelines->liqa->-r requirements.txt (line 3)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/2c/09/7a9f95d35106d882f79ddabc2d33d8f2a262863f1f5d6fd00f46c5fc90aa/formulaic-1.0.1-py3-none-any.whl (94 kB) Collecting future>=0.15.2 (from autograd>=1.5->lifelines->liqa->-r requirements.txt (line 3)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/da/71/ae30dadffc90b9006d77af76b393cb9dfbfc9629f339fc1574a1c52e6806/future-1.0.0-py3-none-any.whl (491 kB) Collecting interface-meta>=1.2.0 (from formulaic>=0.2.2->lifelines->liqa->-r requirements.txt (line 3)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/02/3f/a6ec28c88e2d8e54d32598a1e0b5208a4baa72a8e7f6e241beab5731eb9d/interface_meta-1.3.0-py3-none-any.whl (14 kB) Collecting typing-extensions>=4.2.0 (from formulaic>=0.2.2->lifelines->liqa->-r requirements.txt (line 3)) Using cached https://pypi.tuna.tsinghua.edu.cn/packages/f9/de/dc04a3ea60b22624b51c703a84bbe0184abcd1d0b9bc8074b5d6b7ab90bb/typing_extensions-4.10.0-py3-none-any.whl (33 kB) Collecting wrapt>=1.0 (from formulaic>=0.2.2->lifelines->liqa->-r requirements.txt (line 3)) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b1/e7/459a8a4f40f2fa65eb73cb3f339e6d152957932516d18d0e996c7ae2d7ae/wrapt-1.16.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (80 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 80.1/80.1 kB 1.2 MB/s eta 0:00:00 Building wheels for collected packages: distance, pysam, autograd-gamma Building wheel for distance (setup.py) ... done Created wheel for distance: filename=Distance-0.1.3-py3-none-any.whl size=16258 sha256=b5a69da8b6df4d60cfcb35b39ae9b45b9cc214b22ae9a80fb0ef034a6d89189b Stored in directory: /nfshome/store02/users/c.c23047690/.cache/pip/wheels/9d/b6/0e/d6ebc83ecc5ad23c74204af61f77817d4c0d3e792afd09fc3f Building wheel for pysam (pyproject.toml) ... done Created wheel for pysam: filename=pysam-0.22.0-cp39-cp39-linux_x86_64.whl size=8439949 sha256=ea6fc6646019408a683cf7d6d6e709f2036b7d00b71a69e91a182e01fc0f1914 Stored in directory: /nfshome/store02/users/c.c23047690/.cache/pip/wheels/69/8e/be/1c65c15f101a6931b3619f7ec781d1f630e85529c2ed26eb01 Building wheel for autograd-gamma (setup.py) ... done Created wheel for autograd-gamma: filename=autograd_gamma-0.5.0-py3-none-any.whl size=4031 sha256=11735b468603551ce876e437f8542618cf5a5cddda4dbd6328c7849d388d03b9 Stored in directory: /nfshome/store02/users/c.c23047690/.cache/pip/wheels/cb/60/73/b25b695bbaed121a41fd3550400f073e5020ffa4c9e7ce6b4e Successfully built distance pysam autograd-gamma Installing collected packages: pytz, distance, zipp, wrapt, tzdata, typing-extensions, six, pysam, pyparsing, pillow, packaging, kiwisolver, interface-meta, future, fonttools, cycler, contourpy, biopython, python-dateutil, importlib-resources, autograd, pandas, matplotlib, autograd-gamma, seaborn, formulaic, lifelines, liqa Successfully installed autograd-1.6.2 autograd-gamma-0.5.0 biopython-1.83 contourpy-1.2.0 cycler-0.12.1 distance-0.1.3 fonttools-4.50.0 formulaic-1.0.1 future-1.0.0 importlib-resources-6.4.0 interface-meta-1.3.0 kiwisolver-1.4.5 lifelines-0.28.0 liqa-1.3.4 matplotlib-3.8.3 packaging-24.0 pandas-2.2.1 pillow-10.2.0 pyparsing-3.1.2 pysam-0.22.0 python-dateutil-2.9.0.post0 pytz-2024.1 seaborn-0.13.2 six-1.16.0 typing-extensions-4.10.0 tzdata-2024.1 wrapt-1.16.0 zipp-3.18.1

The version of these libraries is higher than those in the GitHub page, which could support the analysis.

Thanks.

Regards, Lily

shiauck commented 7 months ago

Hi,

The libraries you just installed are all in the same version with what I installed. Could you please further copy me the first 50 / 100 lines of the scanner debug_mode result ?

python3 scNanoGPS/scanner.py -i scNanoGPS/example/fastq/example.fastq.gz --debug_mode 1

Thanks.

Regards, Cheng-Kai

LilyLuyang commented 7 months ago

Hi,

The libraries you just installed are all in the same version with what I installed. Could you please further copy me the first 50 / 100 lines of the scanner debug_mode result ?

python3 scNanoGPS/scanner.py -i scNanoGPS/example/fastq/example.fastq.gz --debug_mode 1

Thanks.

Regards, Cheng-Kai

Hi,

Thank you very much. The 'scanner.log.txt' output under debug_mode as follows:

Starting time stamp: Mon, 25 Mar 2024 15:55:33

List of parameters: Current working directory: scNanoGPS Input file name: scNanoGPS/example/fastq/example.fastq.gz Output FastQ file name: scNanoGPS_res/processed.fastq.gz Output barcode list name: scNanoGPS_res/barcode_list.tsv.gz Log file name: scNanoGPS_res/scanner.log.txt

Parameters for pattern search: Length of barcode: 16 Length of UMI: 12 5'-adaptor sequence: AAGCAGTGGTATCAACGCAGAGTACAT 3'-adaptor sequence: CTACACGACGCTCTTCCGATCT PolyT sequence: TTTTTTTTTTTT Scanning region length: 100

Penalty for dynamic programming: Matching: 2 Mismatching: -3 Gap opening: -5 Gap extension: -2 Editing distance: 2

Parameters for computing: Number of computer cores: 2 Number of reads per batch job: 1000 Minimal length of read: 200 Matching threshold: 0.7 Scoring threshold: 0.4

Debug mode switch: 1

Total 7731 reads are processed. Time elapse: 0 : 0 : 19.07 Detecting rate: 100.00%

Result counting: Number of 3'-adaptor located on the read head region: 3854 Number of 3'-adaptor + polyT on the read head region: 3854 Number of 3'-adaptor located on the read tail region: 3877 Number of 3'-adaptor + polyT on the read tail region: 3877

Alignment counting: Number of 3'-adaptor having no mismatch: 4677

    Number of 3'-adaptor having mismatch at the last one position:          35
    Number of 3'-adaptor having mismatch at all the last two position:      24
    Number of 3'-adaptor having mismatch at all the last three position:    3

    Number of 3'-adaptor having in/del at the last one position:            0
    Number of 3'-adaptor having in/del at the last two position:            0
    Number of 3'-adaptor having in/del at the last three position:          0

    Number of rescued truncated 3'-adaptor on the read head region:         3
    Number of rescued truncated 3'-adaptor on the read tail region:         879

Finish time stamp: Mon, 25 Mar 2024 15:55:53

The detecting rate of adaptors is 100% now.

Regards, Lily

shiauck commented 7 months ago

Hi,

Looks like it's working now. If it's available, please try running SRR21492154 again and see whether you can get 70%+ detection rate. Once you can get above 70% detection rate with SRR21492154, then you're good to run our pipeline with your sample. Thanks.

Regards, Cheng-Kai