H2muller / CROPSR

CROPSR is a python tool designed for genome-wide gRNA design and evaluation for CRISPR experiments, with special focus on complex genomes such as those found in energy-producing crops. CROPSR is a product of the DOE Center for Advanced Bioenergy and Bioproducts Innovation (CABBI).
Apache License 2.0
8 stars 8 forks source link

length error in STDOUT #5

Closed anandksrao closed 1 year ago

anandksrao commented 2 years ago

I just completed a run that generated almost 2.8M lines of output in the csv file and it was quick! First few and last few lines in the CSV output file are shown below:

crispr_id,crispr_sys,sequence,long_sequence,chromosome,start_pos,end_pos,cutsite,strand,on_site_score,features
A0167EO6FF,cas9,AUAACAUAUAUAUAUAUAUU,ACCAAAUAACAUAUAUAUAUAUAUUUCUAA,01,37,57,54,+,0.5849386453124137,,completed
A01JJRHZZF,cas9,AUAACAUAUAUUUCUACACA,ACCAGAUAACAUAUAUUUCUACACAAAAUA,01,291,311,308,+,0.6150581636699067,,completed
.
.
A1CHNZ88S,cas9,CAUAUAGCAACUUGAUCCGG,UUUUCCAUAUAGCAACUUGAUCCGGACUCU,1,271611,271591,271588,-,0.8257269488824107,,completed
A1SRTZKD4,cas9,CGGACUCUGU,UGAUCCGGACUCUGU,1,271628,271608,,-,-1,,completed

In the STDERR there were no messages

However, in the STDOUT file, I see a total of 20 lines indicating there were length errors - please see below.

length error occurred at guide in position 275145 - 275125 of 03, sequence: UGCACCUCUUUUUUCCAAAA
length error occurred at guide in position 150396 - 150376 of 05, sequence: GGCACCUUUCGAGGCUUU
length error occurred at guide in position 113509 - 113489 of 06, sequence: AUGACCCUAAUGGAGCUCAAA
length error occurred at guide in position 113510 - 113490 of 06, sequence: UGACCCUAAUGGAGCUCAAA
length error occurred at guide in position 63290 - 63270 of 09, sequence: UACUCCAACGCUCGUGAUGGU
length error occurred at guide in position 25000 - 24980 of 26, sequence: UAAUCCUGUGAGUUUUCC
length error occurred at guide in position 24833 - 24813 of 27, sequence: UAUACCAUACAGGACGGAAAACAAAACAA
length error occurred at guide in position 23814 - 23794 of 28, sequence: UUAGCCCAUUUGCCAUGAUUCAC
length error occurred at guide in position 23815 - 23795 of 28, sequence: UAGCCCAUUUGCCAUGAUUCAC
length error occurred at guide in position 23822 - 23802 of 28, sequence: UUUGCCAUGAUUCAC
length error occurred at guide in position 23556 - 23536 of 29, sequence: GGAUCCGUUUAAAAAC
length error occurred at guide in position 23409 - 23389 of 30, sequence: CUGUCCAAAUUGUGUCACGUUAA
length error occurred at guide in position 23267 - 23287 of 31, sequence: CCCUACAAAGAGGAUAAUUCCUUGCUAAA
length error occurred at guide in position 23299 - 23279 of 31, sequence: UUAUCCUCUUUGUAGGG
length error occurred at guide in position 56706806 - 56706826 of hr1, sequence: CCCUAAACCCUAAACCCUAAACCCUAAAC
length error occurred at guide in position 44819594 - 44819614 of hr5, sequence: CCUACCCUAAACCCUAAACCUAAACCCUA
length error occurred at guide in position 42866068 - 42866088 of hr6, sequence: CCUAAACCCUAAACCCUAAACCCUAAACC
length error occurred at guide in position 124035 - 124015 of 6, sequence: AAUUCCCGUCAUUCGCCCAUGAA
length error occurred at guide in position 124036 - 124016 of 6, sequence: AUUCCCGUCAUUCGCCCAUGAA
length error occurred at guide in position 271628 - 271608 of 1, sequence: UGAUCCGGACUCUGU

Consequently, my questions are

  1. Is this something that warrants a complete or partial re-run?
  2. What does this message mean?

To answer my questions above, if you need more information, please let me know. Thank you in advance.

H2muller commented 1 year ago

The errors you see in the output are a feature. This error is shown when guides are designed under the defined length. This can happen for a number of reasons, the most common being gaps in the reference genome (e.g.: base N in the sequence), which forces the algorithm to end the design of the sequence and skip to the next iteration.

Along these lines, these errors do not warrant a re-run, as these guides would not be functional anyway.