YakhiniGroup / crispector

Accurate estimation of off-target editing activity from comparative NGS data
Other
12 stars 7 forks source link

Are mock files required? #1

Closed ShanSabri closed 3 years ago

ShanSabri commented 3 years ago

Description

Are mock files required? Is it possible to run without mock files?

What I Did

CMD:

touch mock_R1.fq.gz mock_R2.fq.gz

crispector \
        -t_r1 IL6201_R1.fastq.gz \
        -t_r2 IL6201_R2.fastq.gz \
        -m_r1 mock_R1.fq.gz \
        -m_r2 mock_R2.fq.gz \
        -c IL6201_config.csv

Outputs:

 CCCCC  RRRRRR  IIIII  SSSSS  PPPPPP  EEEEEEE  CCCCC  TTTTTTT  OOOOO  RRRRRR
CC    C RR   RR  III  SS      PP   PP EE      CC    C   TTT   OO   OO RR   RR
CC      RRRRRR   III   SSSSS  PPPPPP  EEEEE   CC        TTT   OO   OO RRRRRR
CC    C RR  RR   III       SS PP      EE      CC    C   TTT   OO   OO RR  RR
 CCCCC  RR   RR IIIII  SSSSS  PP      EEEEEEE  CCCCC    TTT    OOOOO  RR   RR

              CRISPR/CAS9 Off-Target Analysis From NGS Data

2021-07-01 20:11:32,842 INFO     fastp for treatment - Run (may take a few minutes).
2021-07-01 20:11:35,238 INFO     fastp for treatment - Done.
2021-07-01 20:11:35,238 INFO     fastp for mock - Run (may take a few minutes).
2021-07-01 20:11:35,711 INFO     fastp for mock - Done.
2021-07-01 20:11:36,721 INFO     Assigning reads to target amplicons for treatment - Start assigning 171,208 reads - May take a few minutes
2021-07-01 20:11:44,353 INFO     Assigning reads to target amplicons for treatment - 26,688 reads weren't matched (15.59% of all reads)
2021-07-01 20:11:44,814 WARNING  The following read has 6961 repetitions, but it doesn't match any site!This read is most similar to reference site site_7. Please check amplicon sequences correctness. read=CTGGTAATGGGATCACTTTACCTGTTTATGTGATTTTCTTTGGCAATTATTAATTGGGCTTTTTTTTTTTAGGAAAGAGAGACATGCTTGGCTTACCTTGAGTAACAGGGGTTTATTGTAAATAAATGAATGTGAGGTTTTAATCCCTAGATGATGAGCCTTCATGAAG
2021-07-01 20:11:44,831 WARNING  The following read has 2212 repetitions, but it doesn't match any site!This read is most similar to reference site site_7. Please check amplicon sequences correctness. read=CTGGTAATGGGATCACTTTACCTGTTTATGTGATTTTCTTTGGCAATTATTAATTGGGCTTTTTTTTTTAGGAAAGAGAGACATGCTTGGCTTACCTTGAGTAACAGGGGTTTATTGTAAATAAATGAATGTGAGGTTTTAATCCCTAGATGATGAGCCTTCATGAAG
2021-07-01 20:11:44,848 WARNING  The following read has 1168 repetitions, but it doesn't match any site!This read is most similar to reference site site_7. Please check amplicon sequences correctness. read=CTGGTAATGGGATCACTTTACCTGTTTATGTGATTTTCTTTGGCAATTATTAATTGGGCTTTTTTTTTTTTAGGAAAGAGAGACATGCTTGGCTTACCTTGAGTAACAGGGGTTTATTGTAAATAAATGAATGTGAGGTTTTAATCCCTAGATGATGAGCCTTCATGAAG
2021-07-01 20:11:44,864 WARNING  The following read has 830 repetitions, but it doesn't match any site!This read is most similar to reference site target_site. Please check amplicon sequences correctness. read=ACTCAGACACCCTCTCCTGTGTGCAGGACGTGCCGAATGTTCAGGTGCAATGAGAATGAGCCATGCTTGGCTTAACGAGGGCAATCTGGCCCATCAAGTGGCCTTCGCCTCTGGGAGTAACAAAAATGCACTTCAAAATAGCTTCTGTAATCAAGCTGCATGGG
2021-07-01 20:11:44,879 WARNING  The following read has 748 repetitions, but it doesn't match any site!This read is most similar to reference site target_site. Please check amplicon sequences correctness. read=ACTCAGACACCCTCTCCTGTGTGCAGGACGTGCCGAATGTTCAGGTGCAATGAGAATGAGCCATGCTTGGCCCATCAAGTGGCCTTCGCCTCTGGGAGTAACAAAAATGCACTTCAAAATAGCTTCTGTAATCAAGCTGCATGGG
2021-07-01 20:11:44,895 WARNING  The following read has 543 repetitions, but it doesn't match any site!This read is most similar to reference site target_site. Please check amplicon sequences correctness. read=ACTCAGACACCCTCTCCTGTGTGCAGGACGTGCCGAATGTTCAGGTGCAATGAGAATGAGCCATGCTTGGCTTCGAGGGCAATCTGGCCCATCAAGTGGCCTTCGCCTCTGGGAGTAACAAAAATGCACTTCAAAATAGCTTCTGTAATCAAGCTGCATGGG
2021-07-01 20:11:44,910 WARNING  The following read has 534 repetitions, but it doesn't match any site!This read is most similar to reference site target_site. Please check amplicon sequences correctness. read=ACTCAGACACCCTCTCCTGTGTGCAGGACGTGCCGAATGTTCAGGTGCAATGAGAATGAGCCATGCTTGGCAATCTGGCCCATCAAGTGGCCTTCGCCTCTGGGAGTAACAAAAATGCACTTCAAAATAGCTTCTGTAATCAAGCTGCATGGG
2021-07-01 20:11:44,910 INFO     Assigning reads to target amplicons for treatment - Done
2021-07-01 20:11:44,922 INFO     Assigning reads to target amplicons for mock - Start assigning 0 reads - May take a few minutes
2021-07-01 20:11:44,927 INFO     Traceback (most recent call last):
  File "/home/ubuntu/.Anaconda3/envs/crispector_env/lib/python3.7/site-packages/crispector/crispector_main.py", line 80, in run
    tx_reads_d, mock_reads_d, tx_trans_df, mock_trans_df = input_processing.run(tx_in1, tx_in2, mock_in1, mock_in2)
  File "/home/ubuntu/.Anaconda3/envs/crispector_env/lib/python3.7/site-packages/crispector/input_processing/input_processing.py", line 123, in run
    mock_reads, mock_trans_df = self._demultiplex_reads(mock_merged, ExpType.MOCK)
  File "/home/ubuntu/.Anaconda3/envs/crispector_env/lib/python3.7/site-packages/crispector/input_processing/input_processing.py", line 341, in _demultiplex_reads
    reads_df[[L_SITE, L_REV]] = reads_df.apply((lambda row: l_match[row[L_READ]]), axis=1, result_type='expand')
  File "/home/ubuntu/.Anaconda3/envs/crispector_env/lib/python3.7/site-packages/pandas/core/frame.py", line 3160, in __setitem__
    self._setitem_array(key, value)
  File "/home/ubuntu/.Anaconda3/envs/crispector_env/lib/python3.7/site-packages/pandas/core/frame.py", line 3189, in _setitem_array
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

2021-07-01 20:11:44,927 ERROR    Unknown Error. Please contact crispector package author
iamit87 commented 3 years ago

Hi @ShanSabri , Is the mock files are empty (it seems like that from the log). And yes - mock files are obligatory. CRISPECTOR uses the mock files to estimate the experiment background noise. Without these files CRISPECTOR can't estimate if an indel was originated by an edit event or by noise (e.g. sequencing error).